Automatic data collection for facial expression recognition
More Info
expand_more
Abstract
Facial expression recognition today is a widely-researched topic with some pertinent applications in human-computer interaction (HCI), surveillance, etc. The diverse range of expressions makes data collection expensive in terms of time and money, making task-specific data collection inconvenient. This thesis project investigates a possible method for quickly and cheaply collecting this data, especially for a Human Computer Interaction (HCI) application. In particular, the focus is on \textit{Pepper}, a robot meant for interacting with humans through conversation. For collecting such data, emotions should be triggered in participants and their faces should be video-recorded. The triggering method used was to show videos selected for triggering specific emotions, letting participants watch them in pairs, thus enabling mutual interactions to better simulate the social environment in which the robot will operate. After watching each video, participants were asked to rate their feelings through the \textit{AffectButton}, a tool for intuitively describing emotions in a dimensional way. Video selection was based on a questionnaire in which people were asked to rate emotions a video was triggering in them. Recordings obtained from people watching the triggering video were then compared to a model of a neutral expression performed by the same participant, in order to select the frames in which expressions were shown, ignoring neutral and transition frames. The pictures obtained were included in a dataset, which is used for training a linear regressor and a Convolutional Neural Network (CNN). These were then tested on naturalistic data taken during conversations, in order to investigate whether the proposed data collection method could build a useful dataset and the results showed this method to be promising.