Exploring Stance Detection of Opinion Texts: Evaluating the Performance of a Large Language Model

Benchmarking the Performance of Stance Classification by GPT-3-Turbo

More Info
expand_more

Abstract

In April 2020, a Dutch research team swiftly analyzed public opinions on COVID-19 lockdown relaxations. However, due to time constraints, only a small amount of opinion data could be processed. With the surge of popularity in the field of Natural Language Processing (NLP) and the arrival of tools like ChatGPT, a number of tasks involving Large Language Models (LLMs) have become easier. This study aims to address the effectiveness of these LLMs on stance detection using this COVID-19 opinion corpus. The corpus is chunked and sampled to be used as input for OpenAI's GPT-3.5-Turbo LLM. The machine-generated stances are then evaluated against multiple binary classification metrics. It is shown that these models perform very well in the field of stance detection, with an average F-score of 0.895. However, a significant number of misclassifications are observed in one dataset. Therefore we conclude that while LLMs offer valuable guidelines, it is still crucial to verify their outputs when dealing with complex or important public matters.