Leveraging Large Language Models for Classifying Subjective Arguments in Public Discourse
More Info
expand_more
Abstract
This study investigates the effectiveness of Large Language Models (LLMs) in identifying and classifying subjective arguments within deliberative discourse. Using data from a Participatory Value Evaluation (PVE) conducted in the Netherlands, this research introduces an annotation strategy for identifying arguments and extracting their premises. Then, the Llama 2 model is used to test three different prompting approaches: zero-shot, one-shot and few-shot. The performance is evaluated using the cosine similarity metric and later enhanced by introducing chain-of-thought prompting. The results show that zero-shot prompting unexpectedly outperforms one-shot and few-shot prompting, due to the LLM overfitting to the examples provided. Chain-of-thought prompting is shown to improve the argument identification task. The subjectivity of the annotation task is reflected by the low averaged pairwise F1 score between annotators, and the considerable variance in the number of data items marked by each annotator as not being arguments. The subjectivity of the task is further highlighted by a pairwise chain-of-thought prompting analysis, which shows that annotators with more similar annotations received more similar LLM responses.