Leveraging Large Language Models for Classifying Subjective Arguments in Public Discourse

Bachelor thesis (2024)

Authors

A. Dobrinoiu Electrical Engineering, Mathematics and Computer Science

Contributors

L. Cavalcante Siebert Interactive Intelligence - (mentor)

A. Homayounirad Interactive Intelligence - (mentor)

E. Liscio Interactive Intelligence - (mentor)

J. Yang Web Information Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

NLP AI LLM Argument mining

To reference this document use:

http://resolver.tudelft.nl/uuid:4e6354e7-69c3-45d0-adda-e10fea9b9d54

More Info

expand_more

Published Date

27-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This study investigates the effectiveness of Large Language Models (LLMs) in identifying and classifying subjective arguments within deliberative discourse. Using data from a Participatory Value Evaluation (PVE) conducted in the Netherlands, this research introduces an annotation strategy for identifying arguments and extracting their premises. Then, the Llama 2 model is used to test three different prompting approaches: zero-shot, one-shot and few-shot. The performance is evaluated using the cosine similarity metric and later enhanced by introducing chain-of-thought prompting. The results show that zero-shot prompting unexpectedly outperforms one-shot and few-shot prompting, due to the LLM overfitting to the examples provided. Chain-of-thought prompting is shown to improve the argument identification task. The subjectivity of the annotation task is reflected by the low averaged pairwise F1 score between annotators, and the considerable variance in the number of data items marked by each annotator as not being arguments. The subjectivity of the task is further highlighted by a pairwise chain-of-thought prompting analysis, which shows that annotators with more similar annotations received more similar LLM responses.

Files

Research_paper_final_Adina_Dob... (pdf)

(pdf | 0.284 Mb)

Unknown license