Bridging the Emotional Gap: Evaluating Stable Diffusion’s Capability in Generating Context-Appropriate Emotions

A Systematic Analysis of Fear and Anger Depiction Using EmotionBench

More Info
expand_more

Abstract

The ability of generative AI models to accurately depict emotional expressions is crucial for their use in virtual communication and entertainment. This study eval- uates Stable Diffusion’s capability to generate context-appropriate emotional expres- sions, focusing on fear and anger. We address three key questions: (1) How accurately can Stable Diffusion generate these emotions? (2) How does the specificity of textual prompts influence accuracy? (3) Are there observable biases in the generated expres- sions? Using scenarios from the EmotionBench dataset, we designed prompts to evoke specific emotional responses and analyzed the images with GPT-4V, an advanced emo- tion recognition model. Our findings indicate that while Stable Diffusion can generate fear with reasonable accuracy, it struggles with anger, suggesting a potential bias influ- enced by training data. Additionally, the model faces challenges in depicting emotions in complex contexts, such as scenes involving a person inside a car or in dark settings. We found no consistent impact of prompt specificity on accuracy, indicating other factors may influence performance. These insights highlight the need for diverse and high-quality training data and improved evaluation frameworks. Future research should incorporate a broader range of emotions, involve human evaluators, and standardize prompt specificity to enhance the reliability and comprehensiveness of AI-generated emotional expressions. Our study underscores the importance of these improvements to develop emotionally aligned and context-aware AI systems, ultimately enhancing human-computer interactions.

Files

Final_paper.pdf
(pdf | 4.11 Mb)
Unknown license