Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires ‘appropriate reliance’ on AI systems. However, accurately estimating the trustworthin
...
Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires ‘appropriate reliance’ on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and mindsets, we propose debugging an AI system as an intervention to foster appropriate reliance. This paper explores whether a critical evaluation of AI performance within a debugging setting can better calibrate users’ assessment of an AI system. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system — potentially resulting from an early exposure to the AI system’s weakness. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.@en