Detecting speech from body movements

A look into the nature of speech based on neural networks and multi-source domain adaptation

More Info
expand_more

Abstract

Our research focuses on speech detection from body movements using wearable accelerometer data collected in an in-the-wild mingling event. We aim to explore the nature of the connection between speech and body movements. More specifically, we stress on the person-specificity of speech. Many studies have shown that speech always comes along with unconscious body behaviours. There is a strong correlation and synchrony between speech and body movements. Previous research has proved that human behaviour is highly person-specific. In other words, in our experiment set- up, the accelerometer data distributions collected from different persons are different. Based on the two considerations discussed above, our work contains two phases. In the first phase, we investigate utilizing convolutional and recurrent neural networks for learning informative representations from raw body acceleration readings. The model we proposed outperforms the state-of-the-art approach presented in [5] by 6 % (Area Under the Curve) with the same data. In the next stage, we visualize the features extracted by the proposed model. The results show that distributions of data obtained from different individuals can differ (also known as person-specificity of the problem). We adopt two approaches of multi-source domain adaption [6] based on the features extracted by our model, aiming to form a personalized speech detection model for each person in our dataset. The first approach is called transductive parameter transfer (TPT) [5]. It deduces the personalized model of the target domain from the known well-trained models of several source domains based on the assumption that distributions of individuals with similar marginal distributions should also have similar decision boundaries. The second strategy is a sample re-weighting based method where the training samples from different persons are re-weighted with respect to the similarities of their conditional and marginal distributions to the target person. We use those re-weighted samples to train a personalized model for each target person. The approaches we adopted only achieved a relative performance increase compared to the general neural network model trained on all the data. We then discuss the possible reasons why these two methods did not bring significant improvement and what can be the alternative solution in the future.

Files

Thesis_xianhao_ni.pdf
(pdf | 2.65 Mb)
Unknown license