Object pushing in robotics has numerous applications, but it often relies on room-bound object tracking systems such as Motion Capture (MoCap) for accurate object pose acquisition. Such systems limit the potential use scenarios, since they add complexity and cost and require expa
...
Object pushing in robotics has numerous applications, but it often relies on room-bound object tracking systems such as Motion Capture (MoCap) for accurate object pose acquisition. Such systems limit the potential use scenarios, since they add complexity and cost and require expansion of the sensor infrastructure for expanding the operational areas. To address these limitations, we propose a framework that integrates object pushing and vision-guided object pose estimation using a monocular RGB camera that is placed on the pushing agent. By leveraging the temporal coherence of the scene, we accelerate the 2D object detection stage that is common in the field of object pose estimation. Our contributions include the development of a novel framework that continuously pushes an object while simultaneously tracking its pose. We incorporate dynamics by efficiently using a Model Predictive Path Integral controller’s predictive model to predict the object location in the forthcoming image, resulting in an extreme reduction of computational cost for 2D object detection, without sacrificing the accuracy of the pose estimation. Our framework achieves real-time object pushing with accurate object pose estimation, which we demonstrate with real-world experiments. Furthermore, we provide a novel object pose estimation dataset in the widely used BOP format, specifically for robot planar pushing with an on-board camera. Our approach pushes towards eliminating the need for room-bound sensor systems, expanding the potential use cases. By considering temporal coherence and scene dynamics, our framework challenges recent object pose estimation methods that fully process each image as uncorrelated individuals, while providing a promising solution for real-time object pushing.