Voice Based Interfaces for Supermarket robots using Large Language Models

More Info
expand_more

Abstract

This thesis presents the design and evaluation of a comprehensive system for developing voice-based interfaces to support users in supermarkets. These interfaces enable customers to convey their needs across both generic and specific queries. While current state-of-the-art systems like GPTs by OpenAI are easily accessible and adaptable, featuring low-code deployment with options for functional integration, they still face challenges such as increased response times and limitations in strategic control for tailored use-case and cost optimisation. Motivated by the goal of crafting inclusive, personalised, and efficient conversational agents, this study advances on three fronts: 1) a comparative analysis of four popular off-the-shelf speech recognition technologies to identify the most accurate model for different genders (male/female) and languages (English/Dutch); 2) an assessment of the effects of personalised recommendations versus generic responses, using a blindfolded, counterbalanced within-subject experiment; and 3) the development and evaluation of a novel multi-LLM supermarket chatbot framework, comparing its performance with a specialized GPT model powered by the GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) in a counterbalanced within-subjects experiment and qualitative participant feedback. Our find-ings reveal that OpenAI’s Whisper leads in speech recognition accuracy across genders and languages, users significantly prefer personalised chatbots over the non-personalised counterparts and that our proposed multi-LLM chatbot architecture outperformed the benchmarked GPT model across all 13 measured criteria, including statistically significant improvements in four key areas: performance, user satisfaction, user-agent partnership, and self-image enhancement. The thesis concludes by presenting a simple method for supermarket robot navigation by mapping the final chatbot response to correct shelf numbers towards which the robot can plan sequential visits. This later enables effective use of low-level perception, motion planning, and control capabilities for product retrieval and collection. We hope this work encourages more efforts into using multiple, specialised smaller models instead of always relying on a single powerful model.

Files

MSc_Thesis_3_.pdf
(pdf | 4.55 Mb)
Unknown license