Human centric object perception for service robots
More Info
expand_more
Abstract
The research interests and applicability of robotics have diversified and seen a
tremendous growth in recent years. There has been a shift from industrial robots operating in constrained settings to consumer robots working in dynamic environments associated closely with everyday human activities. Personal service robots to assist elderly, compliant robots with advanced perception skills for flexible manufacturing and autonomous driving vehicles for safe transportation are among the promising directions. In all these cases, robots have to work in close cooperation with human users and an intuitive higher level interaction between robots and layman users is essential for its widespread acceptability. Hence in this thesis, development of cognitive and perceptual skills in humans is studied and applied to the development of robot’s perceptual skills, especially based on visual information from a user interaction point of view.
A physical robot is developed from scratch considering the aspects of affordability and user acceptability. A 9 DoF robot, LEA which incorporates a differential drive base, 4 DoF arm with a gripper and a pan-tilt neck supporting the robot’s head. The entire mechanics and control electronics are custom developed leading to decreased mechanical complexity and increased flexibility in physical dimensions. All the components are well integrated with a socially appealing industrial design which has been well received by the public and media. The limitations arising from simplified mechanics and affordable hardware are compensated by advanced adaptive vision algorithms to achieve the required functionalities of a service robot. A generic human centric architecture for highly autonomous and interactive robots is proposed to integrate various capabilities of a robot that are triggered by user interaction. A specific case of object recognition is investigated, as many tasks faced by such robots involve perception and manipulation of different household objects.
An intuitive non-verbal interaction between a user and a robot for conveying objects of interest to the robot is developed. The developed spatial grounding model can detect the object of user interest independent of the relative position between the robot, the user and the object and without any prior training. This is achieved by a hybrid attention system combining bottom-up color saliency with depth image and top-down cues comprising user’s pointing direction and gaze. Robustness of gaze based attention system is improved by automatically switching between a keypoint based and a color based approach depending on objects’ texture.
The recognition of these objects is achieved with a three layered semantic recognition framework that can incorporate multiple modalities of information. Developed based on studies of human perception, this method achieves recognition robustness in unconstrained domestic environments while providing semantic grounding with human users. Modalities of color, shape and object location have been incorporated into this recognition model while maintaining flexibility to include additional modalities. The first layer consists of semantic grounding modules that abstract raw sensory information into a probability distribution over meaningful semantic concepts familiar to humans. A second layer operates on these semantic features to obtain an object hypothesis based on every individual modality. The last layer performs knowledge association to estimate combined probability over known objects to obtain the final inference.
A novel algorithm to track contours of objects and persons to allow exploration from different viewpoints is developed. Visual model of the target is refined by considering only the dominant 3D cluster within the initial bounding box. A tracking-by detection algorithm constrains the search space in the image by removing regions based on metric size constancy of the object and other structural patterns like perpendicular planes. A feature based on Color Naming System has been used with an online learning classifier to obtain a color probability map while the depth probability map is obtained by using a Gaussian model of the object’s depth distribution. An optimal fusion of different object modalities using a target-background dissimilarity measure is developed and is used in a graphcut framework to continuously obtain contours of the target object.
The reliability of recognition of these objects in challenging domestic environments is enhanced using visual appearances from multiple views while incorporating the spatial relations between these viewpoints as well. A Sequence Alignment algorithm has been used with vector quantized features from each view to achieve view point correlation in object recognition. A fast Visual Odometry estimation has been used to obtain viewpoint relations in an unsupervised manner and this has been incorporated with segmentation to provide a standalone system that can be used in real world scenario. This system is made generic to be used with different feature vectors and a benchmark is created to compare the performance improvement achieved by the developed system with respect to single view object recognition using different feature vectors.
Object recognition in service robots can be augmented by incorporating the context of objects’ use within the developed semantic recognition framework. The utility of an object can be understood by the actions performed by the user on the object and hence an Action Recognition system based on human skeletal tracking with a novelty detection method is developed to facilitate the incremental learning of new actions. Compact representations of skeletal structure are obtained using a Torso-PCA transform and are used as observations for a HMM based system to recognize user actions. Uncertainty in predictions, quantified as confidence measures are thresholded to detect unknown actions. These confidence measures are obtained through background models and different methods are evaluated with respect to sensitivity and specificity of recognition performance.
Various algorithms are developed to enhance the reliability of object perception overcoming challenges posed by dynamic environments and affordable hardware by incorporating different modalities of information available to a robot. The development of algorithms in this direction is significant as these concepts can be readily extended to incorporate user and environment recognition to complete the perceptual capabilities of robots …