Exploring trade-offs in on-board versus cloud-based social robots
More Info
expand_more
Abstract
Social Robotics is an emerging field in Computer Science. Most social robots currently commercially available to buy do not have fast hardware components. As a result, the built-in software has low accuracy and performance with (amongst others) speech and facial recognition and dialogs during social interaction with users. Cloud computation offers state-of-the-art techniques, performance, and accuracy with its massive available computational power, but at extra costs and increased latency.
In this work, we extend and improve a social robot's standard capabilities and performance by making use of cloud computation. This thesis covers an exploration for the trade-offs present when replacing or augmenting built-in robot software with IBM cloud services, based on the humanoid Pepper robot from Softbank.
The approach for this exploration was guided by a hospitality use case demonstrated in the offices of two companies: a Dutch Health Insurer and IBM Netherlands. Two products were developed for this single use case based on different development toolboxes. The first toolbox contains all development software from the robot's manufacturer (the NAOqi toolbox), while the second toolbox makes use of cloud services (the Watson Toolbox). Using the product built with the NAOqi toolbox, we evaluate interactions with real users and obtain baseline data and experiences. After evaluating the second product built with the Watson toolbox, we can compare differences in human-robot interaction quality, robot component quality, development methods, and software engineering complexity and Total Costs of Ownership for both products.
The main findings include an overview of relevant test metrics and test methods for a social robot's component, including acquired data for some components of the Pepper robot. We show possible architectures for a (semi) cloud-based system, and their trade-offs. Evaluations show that the cloud-based system indeed performs better and has higher human-interaction quality compared to the product built with the NAOqi toolbox, yet downsides such as latency and operating costs are present. This is also reflected in the analysis of single components, where specifically Speech-to-Text from the cloud shows a significant increase in performance and capabilities. We show that a mix of toolboxes results in the best working and cheapest social robot when considering Total Cost of Ownership. IBM Cloud pricing structures and operating costs are analyzed for this. Finally, we contribute to the currently available knowledge on this subject with a decision matrix combining all previously mentioned information in a compact form accessible to people not knowledgeable in the hospitality robot or cloud domains. With the matrix, early-development advice decisions for creating a social robot can be formulated using the data gathered in this thesis.
With a broad approach, this research focuses on finding and discussing trade-offs, rather than an in-depth analysis of all components. Providing methods to determine the data as mentioned above, findings, and trade-offs are more important than the actual numbers found in this thesis, as advances in this domain are quick and expected to change often. The end product built using the Watson toolbox is an improvement on multiple levels, yet is still not always able to autonomously and correctly finish all intended interactions. However, its capabilities, performance, and robustness are closer to the level of being used commercially. The techniques we use to extend Pepper's capabilities could be applied to any social robot.