Evaluating the Efficacy and User Reliance on RAG Model Outputs

Sobha, R.R.

Evaluating the Efficacy and User Reliance on RAG Model Outputs

A comparative study with human experts

Master thesis (2024)

Authors

R.R. Sobha Electrical Engineering, Mathematics and Computer Science

Contributors

Ujwal Gadiraju Web Information Systems (mentor)

Benedikt Ahrens Programming Languages (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science

User Trust Generative AI Conversational AI Retrieval Augmented Generation User Reliance

To reference this document use:

http://resolver.tudelft.nl/uuid:ff41d0d3-f838-4d26-9d87-a68b685bc79e

More Info

expand_more

Published Date

29-08-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

The emergence of conversational AI systems like ChatGPT and Microsoft Copilot has impacted how users engage in information retrieval.
Retrieval Augmented Generation (RAG) harnesses the potential of Large Language Models (LLMs) with unstructured data, creating opportunities in science and business.
RAG-based models have gained popularity, but their effectiveness and user reliance in organizational settings call for exploration. This thesis involves a user study with policy experts in the financial domain.
They were tasked with text aggregation using a basic RAG model. The study delves into the model’s performance and the temporal development of user reliance among the experts over four weeks.
Our key findings reveal that outputs assisted by RAG do not match the quality produced by human experts.
The RAG model, however, excels in specific aspects such as structure, spelling, and grammar.
Additionally, the experts express satisfaction with the efficiency of RAG. Our findings suggest that user reliance on RAG increases with experience.
This underscores the need for interventions and policies to support responsible human-AI collaboration.
This work represents an effort to measure the temporal aspects of user reliance within an RAG system.
Simultaneously, it assesses the system’s efficacy in a field study with policy experts in the financial domain.

Files

Rohan_Sobha_MSc_Thesis_V1.0.pd... (pdf)

(pdf | 2.41 Mb)

Unknown license