Comparing and Analyzing Different Speech Conversion Techniques for Transforming Dysarthric to Normal Speech

Liu, J.

Comparing and Analyzing Different Speech Conversion Techniques for Transforming Dysarthric to Normal Speech

Master thesis (2024)

Authors

J. Liu Electrical Engineering, Mathematics and Computer Science

Contributors

O.E. Scharenborg (mentor)

Q. Song Embedded Systems - (graduation committee member)

Z. Yue (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Intelligibility Naturalness Dysarthric speech recognition Voice conversion

To reference this document use:

http://resolver.tudelft.nl/uuid:f25eb395-75ea-43dc-a9b9-4646ef077eab

More Info

expand_more

Published Date

29-05-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Dysarthric speech, characterized by articulation problems and a slower speech rate, shows lower automatic speech recognition (ASR) performance compared to normal speech. To improve performance, researchers often try to enhance dysarthric speech to be more like normal speech before passing it through an ASR trained on normal speech. In this project, we compare different signal processing and voice conversion techniques for dysarthric-to-normal speech enhancement. The resulting enhanced speech is objectively evaluated using an ASR system trained on normal speech. Also, the naturalness and intelligibility of the enhanced dysarthric speech are evaluated through listening experiments. Finally, the correlation between subjective and objective evaluations was analyzed. We found that among the techniques investigated, time-stretching demonstrated superior performance in objective evaluation experiments, surpassing state-of-the-art voice conversion methods. Across all methods, improvements in naturalness and intelligibility were positively correlated with improvements in automatic speech recognition (ASR) performance. However, this correlation was significant for some methods but not for others.

Files

TUD_Msc_Thesis_Jingxian.pdf

- Embargo expired in 01-10-2024

Unknown license