Bridging the Emotional Gap: Evaluating Stable Diffusion’s Capability in Generating Context-Appropriate Emotions

den Boer, J.

Bridging the Emotional Gap: Evaluating Stable Diffusion’s Capability in Generating Context-Appropriate Emotions

A Systematic Analysis of Fear and Anger Depiction Using EmotionBench

Bachelor thesis (2024)

Authors

J. den Boer Electrical Engineering, Mathematics and Computer Science

Contributors

A. Lukina Algorithmics - (mentor)

P. Kellnhofer Computer Graphics and Visualisation - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Stable Diffusion Generative Models Emotions AI Alignment

To reference this document use:

http://resolver.tudelft.nl/uuid:79cc9ad0-a04c-47c5-96bb-72accf29eee6

More Info

expand_more

Published Date

25-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

The ability of generative AI models to accurately depict emotional expressions is crucial for their use in virtual communication and entertainment. This study eval- uates Stable Diffusion’s capability to generate context-appropriate emotional expres- sions, focusing on fear and anger. We address three key questions: (1) How accurately can Stable Diffusion generate these emotions? (2) How does the specificity of textual prompts influence accuracy? (3) Are there observable biases in the generated expres- sions? Using scenarios from the EmotionBench dataset, we designed prompts to evoke specific emotional responses and analyzed the images with GPT-4V, an advanced emo- tion recognition model. Our findings indicate that while Stable Diffusion can generate fear with reasonable accuracy, it struggles with anger, suggesting a potential bias influ- enced by training data. Additionally, the model faces challenges in depicting emotions in complex contexts, such as scenes involving a person inside a car or in dark settings. We found no consistent impact of prompt specificity on accuracy, indicating other factors may influence performance. These insights highlight the need for diverse and high-quality training data and improved evaluation frameworks. Future research should incorporate a broader range of emotions, involve human evaluators, and standardize prompt specificity to enhance the reliability and comprehensiveness of AI-generated emotional expressions. Our study underscores the importance of these improvements to develop emotionally aligned and context-aware AI systems, ultimately enhancing human-computer interactions.

Files

Final_paper.pdf

(pdf | 4.11 Mb)

Unknown license