Metrics to Ascertain the Plausibility and Faithfulness of Counterfactual Explanations

More Info
expand_more

Abstract

Counterfactual Explanations (CE) are essential for understanding the predictions of black-box models by suggesting minimal changes to input features that would alter the output. Despite their importance in Explainable AI (XAI), there is a lack of standardized metrics to assess the plausibility and faithfulness of these explanations. This paper reviews evaluation procedures in literature and proposes novel formal metrics for evaluating the plausibility and faithfulness of counterfactual explanations, addressing the existing limitations. Plausibility is defined as the coherence of explanations with the true data-generating process, while faithfulness refers to the accuracy of explanations in representing the model's reasoning. We discuss the shortcomings of existing evaluation procedures and metrics for measuring plausibility and faithfulness and consequently compare our proposed metrics with existing ones, highlighting their advantages and disadvantages. The proposed metrics are then empirically validated through experiments across multiple models and datasets, demonstrating their model-agnostic nature and reliability. Our findings indicate that the proposed metrics provide a correct and reliable means to quantify the plausibility and faithfulness of counterfactual explanations, thereby allowing one to gauge their feasibility and trustworthiness consistently.

Files

Research_Paper-37.pdf
(pdf | 0.242 Mb)
Unknown license