Multi-representation Emotion Recognition in Immersive Environments

More Info
expand_more

Abstract

This study addresses the gap for fine-grained emotion recognition in immersive environments utilizing solely data from on-board sensors. Two data representations of users eyes are utilized, including periocular recordings and eye movements (gaze estimation and pupil measurements). A novel multi-representation method integrating feature extractors for each representation alongside an effective feature fusion technique is proposed. The method significantly outperforms baselines that use only a single representation or incorporate content stimuli. It achieves an F1-score of 0.85 with 10\% data, approximately 40 seconds of data from all emotions, for personal adaptation, recognizing emotions while watching unseen parts of the stimuli used for adaptation. In a more practical scenario, the method achieves an F1-score of 0.71 with five seconds of personal adaptation data from each emotion, recognizing emotions while watching completely unseen stimuli. Under the same but more extreme condition, where only one second of data is available, the proposed achieves an F1-score of 0.68.
Furthermore, the study demonstrates that estimated labels can substitute for user-provided labels without sacrificing performance in emotion recognition, thus eliminating the need for users to manually label emotion elicitation segments.
Future work will focus on improving performance by allocating more computational resources and making architectural modifications, conducting deeper investigations into the decision-making process, and developing real-time recognition systems for in-the-wild experiments.
The results of this study suggest that more engaging, adaptive, and personalized experiences in immersive environments can be developed.

Files

Tony_Thesis_Emotion_Recognitio... (pdf)
warning

File under embargo until 22-10-2025