How do ASR systems of Google and Microsoft compare when recognizing Dutch spoken by native speakers over the age of 60?

More Info
expand_more

Abstract

Automatic Speech Recognition (ASR) systems are found in many places and are used by many people. Some groups of people, superficially older Dutch adults, are recognized less well by these systems. Given the aging population of the Netherlands, it would be beneficial to have ASR systems be more inclusive to allow for more independence of the older adults. By conducting tests on the ASR systems of Google and Microsoft, making use of the JASMIN dataset, I compared the two using word-error-rate (WER), word-information-lost (WIL) and character-error-rate (CER). Results show Microsoft outperforming Google with an average word error rate of 19.6\% compared to 27.35\%. However, Google is less biased on the topics of gender and age. Microsoft was slightly less biased in regards to region, but only by a small margin. Overall, the most notable findings from both systems are a small bias toward female speakers, and a strong bias against speakers from the southern regions of the Netherlands. These findings highlight the need for more inclusive ASR systems, enhancing the independence of older adults.

Files