This thesis focuses on closed-loop product grasping from supermarket shelves. The case is studied where the robot is in front of a shelf in an Albert Heijn supermarket and is tasked to pick a desired product from that shelf. Enabling a robot to achieve the product-picking task, h
...
This thesis focuses on closed-loop product grasping from supermarket shelves. The case is studied where the robot is in front of a shelf in an Albert Heijn supermarket and is tasked to pick a desired product from that shelf. Enabling a robot to achieve the product-picking task, however, is challenging. While many other robotic picking methods are centered around table-top environments, the complex geometry of supermarket shelves presents a challenge in itself. Additionally, the Albert Heijn supermarket is a dynamic environment, where other agents can change the shelves, move products, and can introduce lighting changes. Where the table-top environment allows other methods to pick objects with an open-loop controller, approaching the supermarket environment with a closed-loop picking strategy can be beneficial in overcoming the challenges introduced by the dynamic environment. Therefore, this thesis proposes a product grasping pipeline, where the goal is to discover what combination and optimization of the required robotic skills results in a system that enables the robot to consistently and robustly perform the product-picking task. To close the loop, in-hand position-based visual servoing is used that enables the robot to account for detection mistakes as it picks a product. The three robotic skills that are required for the product-picking pipeline are product detection, product grasp pose estimation, and product tracking. Because of visual servoing, each of these robotic skills must run in real-time. The product detection is achieved by pre-training a YOLOv6 object detector on the SKU-110K dataset and fine-tuning it to the new Albert Heijn Supermarket dataset. The Albert Heijn Supermarket dataset is created to detect 36 products in the supermarket, where the challenges of distinguishing similar products and detecting relocated products are included. To enable detections during visual servoing, the distance, directions toward the shelf, and lighting are varied. The product grasp poses for the suction cup can be estimated using a plane fit on the estimated pointcloud for each product. The pointcloud of the product is estimated by randomly sampling the depth data from the product. Each product pose is then tracked through time via Kalman Filtering, to enable temporal reasoning about the products. Because of this, the grasp pose of the desired product can be refined as the manipulator moves toward the shelf. The proposed system achieved success rates of 90\%-100\% during experiments on a real robot with a suction cup gripper. While robust picking on a set of 36 products has been achieved, exploring a wider variety of product shapes with other robotic grippers is a compelling research direction to overcome the challenge of picking oddly shaped products. Furthermore, a separate few-shot-learning classifier for the product classification might be used to overcome the challenge of adding new products to the inventory or product re-branding. Next, considering other picking scenarios in the supermarket, like picking from hooks or refrigerated shelves with doors, is important for deployment as well. Finally, interacting with humans and reacting to human behavior during the picking process to ensure safety is another crucial challenge that must be overcome to move toward the integration and deployment of robotics in supermarkets.