Metrics to Ascertain the Plausibility and Faithfulness of Counterfactual Explanations

Bachelor thesis (2024)

Authors

A.F. Yücel Electrical Engineering, Mathematics and Computer Science

Contributors

P. Altmeyer (mentor)

C.C.S. Liem (mentor)

B.J.W. Dudzik Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:d80b688c-b0f6-4c88-a0a2-891d738f25d4

More Info

expand_more

Published Date

27-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Counterfactual Explanations (CE) are essential for understanding the predictions of black-box models by suggesting minimal changes to input features that would alter the output. Despite their importance in Explainable AI (XAI), there is a lack of standardized metrics to assess the plausibility and faithfulness of these explanations. This paper reviews evaluation procedures in literature and proposes novel formal metrics for evaluating the plausibility and faithfulness of counterfactual explanations, addressing the existing limitations. Plausibility is defined as the coherence of explanations with the true data-generating process, while faithfulness refers to the accuracy of explanations in representing the model's reasoning. We discuss the shortcomings of existing evaluation procedures and metrics for measuring plausibility and faithfulness and consequently compare our proposed metrics with existing ones, highlighting their advantages and disadvantages. The proposed metrics are then empirically validated through experiments across multiple models and datasets, demonstrating their model-agnostic nature and reliability. Our findings indicate that the proposed metrics provide a correct and reliable means to quantify the plausibility and faithfulness of counterfactual explanations, thereby allowing one to gauge their feasibility and trustworthiness consistently.

Files

Research_Paper-37.pdf

(pdf | 0.242 Mb)

Unknown license