Unsupervised Facial Expression Recognition using Periocular Images

Master thesis (2022)

Authors

Y. Fu Electrical Engineering, Mathematics and Computer Science

Contributors

G. Lan Embedded Systems - (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Facial expression recognition Self-supervised learning Contrastive learning Image frequency domain

To reference this document use:

http://resolver.tudelft.nl/uuid:e775b315-5bdc-4324-8d5e-3491cab1eb43

More Info

expand_more

Published Date

17-10-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Facial expression recognition on head-mounted devices (HMDs) is an intriguing research field because of its potential in various applications, such as interactive virtual reality video meetings. Existing work focuses on building a supervised learning pipeline that utilizes a vast amount of labeled periocular images taken by the built-in cameras. However, the labeling process requires intensive manual work, which is costly and time-consuming. In this thesis, we apply self-supervised learning techniques that leverage unlabeled periocular images to learn representations for facial expression recognition tasks. When the model is trained through self-supervised learning, it is transferred as a feature extractor to form multiple individual inference models. In addition, as most deep-learning models only accept low-resolution images due to heavy computing overload with large input sizes, images taken by modern cameras on HMDs are often aggressively down-sampled to match the model input. To improve the model’s usability and speed up the model, we train the models using the frequency representations of images rather than conventional RGB pixel arrays. We conducted extensive evaluation experiments with a dataset of 23 invited volunteers. The results show that the self-supervised models achieve comparable performance with existing approaches, and, at the same time, the number of labels used is reduced by approximately 99%. By training in the frequency domain, the model is 1.89 times faster in training and 1.23 times faster in inference with reduced input size.

Files

Yuan_Fu_Unsupervised_Facial_Ex... (pdf)

(pdf | 6.9 Mb)

- Embargo expired in 31-03-2023