Natural Language Counterfactual Explanations in Financial Text Classification
More Info
expand_more
Abstract
Central banks communicate their monetary policy plans to the public through meeting minutes or transcripts. These communications can have immense effects on markets and are often the subjects of studies in the financial literature. The recent advancements in Natural Language Processing have prompted researchers to analyze these communications using Transformer-based Large Language Model (LLM) classifiers. The use of LLMs in finance and other high-stakes domains calls for a high level of trustworthiness and explainability of those models. We focus on Counterfactual Explanations, a form of Explainable AI that explains a model's classification by proposing an alternative to the original input. We use three types of CE generators for LLM classifiers on a recent dataset consisting of sentences taken from FOMC communications to assess the usability of their explanations. We perform three experiments comparing different types of generators, one using a selection of quantitative metrics and two involving human evaluators, including central bank employees. Our findings suggest that non-expert and expert evaluators prefer counterfactual methods that apply minimal changes to the texts; however, the methods we analyze might not handle the domain-specific vocabulary well enough to generate plausible explanations for our task. We discuss shortcomings in the choice of evaluation metrics in the literature on text CE generators and propose refined definitions of the fluency and plausibility qualitative metrics.