Unsupervised Facial Expression Recognition using Periocular Images

More Info
expand_more

Abstract

Facial expression recognition on head-mounted devices (HMDs) is an intriguing research field because of its potential in various applications, such as interactive virtual reality video meetings. Existing work focuses on building a supervised learning pipeline that utilizes a vast amount of labeled periocular images taken by the built-in cameras. However, the labeling process requires intensive manual work, which is costly and time-consuming. In this thesis, we apply self-supervised learning techniques that leverage unlabeled periocular images to learn representations for facial expression recognition tasks. When the model is trained through self-supervised learning, it is transferred as a feature extractor to form multiple individual inference models. In addition, as most deep-learning models only accept low-resolution images due to heavy computing overload with large input sizes, images taken by modern cameras on HMDs are often aggressively down-sampled to match the model input. To improve the model’s usability and speed up the model, we train the models using the frequency representations of images rather than conventional RGB pixel arrays. We conducted extensive evaluation experiments with a dataset of 23 invited volunteers. The results show that the self-supervised models achieve comparable performance with existing approaches, and, at the same time, the number of labels used is reduced by approximately 99%. By training in the frequency domain, the model is 1.89 times faster in training and 1.23 times faster in inference with reduced input size.

Files

Yuan_Fu_Unsupervised_Facial_Ex... (pdf)
(pdf | 6.9 Mb)
- Embargo expired in 31-03-2023