Data Augmentation for Deep Learning-based Gaze Estimation
More Info
expand_more
Abstract
This study aims to provide insights in applying different data augmentation techniques to the input data of a convolutional neural network that estimates gaze. Gaze is used in numerous research domains for understanding and predicting emotions and actions from humans. Data augmentations consists of techniques to increase the size, variance and quality of training data to create better deep-learning models. Data augmentation is a widely used technique to reduce overfitting and increase accuracy of deep learning models. This research combines those two fields by first applying different individual data augmentations on the task of gaze estimation and after that combining the most useful methods to decrease the mean angular error even further. The results show that small geometric transformations, such as translating the image a portion of 15% or flipping the image horizontally 50% of the time give the most significant reductions in mean angular error. For individually applied data augmentation methods flipping got the best improvement, with 33% and 35% for both models in comparison to the baseline model. The best result is obtained by combining flipping with translation which got a mean angular error of 1.396 and 1.389 for both models. For obtaining the results a lot of training is necessary, which was the main limitation to conduct the experiments.