Comparing and Analyzing Different Speech Conversion Techniques for Transforming Dysarthric to Normal Speech

More Info
expand_more

Abstract

Dysarthric speech, characterized by articulation problems and a slower speech rate, shows lower automatic speech recognition (ASR) performance compared to normal speech. To improve performance, researchers often try to enhance dysarthric speech to be more like normal speech before passing it through an ASR trained on normal speech. In this project, we compare different signal processing and voice conversion techniques for dysarthric-to-normal speech enhancement. The resulting enhanced speech is objectively evaluated using an ASR system trained on normal speech. Also, the naturalness and intelligibility of the enhanced dysarthric speech are evaluated through listening experiments. Finally, the correlation between subjective and objective evaluations was analyzed. We found that among the techniques investigated, time-stretching demonstrated superior performance in objective evaluation experiments, surpassing state-of-the-art voice conversion methods. Across all methods, improvements in naturalness and intelligibility were positively correlated with improvements in automatic speech recognition (ASR) performance. However, this correlation was significant for some methods but not for others.

Files

TUD_Msc_Thesis_Jingxian.pdf
- Embargo expired in 01-10-2024