Detecting speech from body movements

Ni, X.

Detecting speech from body movements

A look into the nature of speech based on neural networks and multi-source domain adaptation

Master thesis (2019)

Authors

X. Ni Electrical Engineering, Mathematics and Computer Science

Contributors

H.S. Hung Pattern Recognition and Bioinformatics (mentor)

David Tax Pattern Recognition and Bioinformatics (coach)

E. Gedik Pattern Recognition and Bioinformatics (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Neural Network Multi-source Domain Adaption Speech Detection

To reference this document use:

http://resolver.tudelft.nl/uuid:0f79d992-7add-442e-a227-00e939b5bb83

More Info

expand_more

Published Date

01-09-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Our research focuses on speech detection from body movements using wearable accelerometer data collected in an in-the-wild mingling event. We aim to explore the nature of the connection between speech and body movements. More specifically, we stress on the person-specificity of speech. Many studies have shown that speech always comes along with unconscious body behaviours. There is a strong correlation and synchrony between speech and body movements. Previous research has proved that human behaviour is highly person-specific. In other words, in our experiment set- up, the accelerometer data distributions collected from different persons are different. Based on the two considerations discussed above, our work contains two phases. In the first phase, we investigate utilizing convolutional and recurrent neural networks for learning informative representations from raw body acceleration readings. The model we proposed outperforms the state-of-the-art approach presented in [5] by 6 % (Area Under the Curve) with the same data. In the next stage, we visualize the features extracted by the proposed model. The results show that distributions of data obtained from different individuals can differ (also known as person-specificity of the problem). We adopt two approaches of multi-source domain adaption [6] based on the features extracted by our model, aiming to form a personalized speech detection model for each person in our dataset. The first approach is called transductive parameter transfer (TPT) [5]. It deduces the personalized model of the target domain from the known well-trained models of several source domains based on the assumption that distributions of individuals with similar marginal distributions should also have similar decision boundaries. The second strategy is a sample re-weighting based method where the training samples from different persons are re-weighted with respect to the similarities of their conditional and marginal distributions to the target person. We use those re-weighted samples to train a personalized model for each target person. The approaches we adopted only achieved a relative performance increase compared to the general neural network model trained on all the data. We then discuss the possible reasons why these two methods did not bring significant improvement and what can be the alternative solution in the future.

Files

Thesis_xianhao_ni.pdf

(pdf | 2.65 Mb)

Unknown license