Improving State-of-the-Art ASR Systems for Speakers with Dysarthria

Applying Low-Rank Adaptation Transfer Learning to Whisper

More Info
expand_more

Abstract

Dysarthria is a speech disorder that limits an individual’s ability to clearly articulate, due to the weakening of the muscles involved in speech. Despite recent advances in Automatic Speech Recognition (ASR), the recognition of dysarthric speech remains a significant challenge because of the limited availability of dysarthric speech data, significant speaker variability, and the mismatch between typical and dysarthric speech patterns. This paper addresses these challenges by using transfer learning and Low-Rank Adaptation (LoRA) techniques to enhance the performance of the state- of-the-art ASR model Whisper on dysarthric speech. By fine-tuning Whisper with the TORGO dataset, this study aims to adapt the pre-trained models to better recognise dysarthric speech patterns, thus reducing Word Error Rates (WER) and improving accessibility for individuals with speech impairments. Experimental results indicate that this approach can improve speech recognition performance since the Large- V2, Large-V3 and the corresponding distilled models achieved a reduction in WER after fine-tuning. The Large-V3 model achieved the greatest relative WER reduction of 22.65%.

Files