Dysarthric speech recognition is challenging due to speech variability caused by neurological disorders. This study explores integrating articulatory features with large pre-trained acoustic model features (e.g., WavLM, Whisper) to improve recognition performance. Different fusi
...
Dysarthric speech recognition is challenging due to speech variability caused by neurological disorders. This study explores integrating articulatory features with large pre-trained acoustic model features (e.g., WavLM, Whisper) to improve recognition performance. Different fusion strategies, including concatenation and cross-attention mechanisms, are also compared in this work. Experimental results show that articulatory features can enhance WavLM-extracted features, reducing WER for moderate and mild severity level. t-SNE analysis reveal how articulatory features influence feature representations. These findings highlight the potential of multimodal fusion in improving dysarthric ASR systems.