Enabling Domain Expert Evaluation of Emerging AI Technologies in Healthcare Settings

Introduction of Dual-Core Evaluation Framework for Conversational AI in Healthcare

More Info
expand_more

Abstract

In the rapidly evolving healthcare landscape, integrating Artificial Intelligence (AI), particularly Large Language Models (LLMs), presents significant opportunities and complex challenges. This study examines the efficacy and implications of LLMs within healthcare systems, with a focus on enhancing user-centric evaluation methods for AI-generated healthcare advice. The research centres on three critical areas: Identifying the limitations of current evaluation methods, addressing the challenges related to the accuracy and reliability of LLM-generated advice, and proposing improvements to evaluation frameworks to enhance the practical application of these models in healthcare settings.

Our study utilizes a chatbot prototype trained specifically on healthcare datasets relevant to the Dutch context, exploring its application in real-world scenarios to validate and refine evaluation metrics. By involving healthcare professionals in interactions with the chatbot, we aim to ground our findings in practical, user-based experiences. The engagement with the prototype helps uncover vital insights into the AI’s performance, emphasizing the necessity for models that generate reliable and ethical responses and resonate with professional healthcare practices.

The proposed research contributes to the broader discourse on AI in healthcare by offering a novel framework for assessing AI-generated responses through a blend of empirical user studies and theoretical analysis. This framework aims to mitigate the subjective nature of current evaluations and provide a more robust, standardized approach to assessing the impact of AI technologies on healthcare outcomes. Through this research, we aim to forge a path toward more responsive, responsible, and user-centred AI tools in healthcare, ensuring that they align with both professional standards and patient needs.