Evaluating the Use of Pitch Shifting to Improve Automatic Speech Recognition Performance on Southern Dutch Accents
More Info
expand_more
Abstract
Building Automatic Speech Recognizers (ASRs) has been a challenge in languages with insufficiently sized corpora or data sets. A further large issue in language corpora is biases against regionally accented speech and other speaker attributes. There are some techniques to improve ASR performance and reduce biases in these corpora, known as data augmentations. One audio data augmentation, pitch shifting, has had successes in other experiments for increasing ASR performance. Pitch shifting it is tested in this paper on the JASMIN-CGN speech data set from the Southern regions of the Netherlands. Using a hybrid GMM-HMM ASR, two baselines are developed, one using all speech data from the region, the other only using native speech. For the former ASR, pitch shifting is found to not improve Word Error Rate (WER) performance or reduce bias, but the latter succeeds in improving WER performance and reduced bias for certain speaker groups when augmented.