Evaluating the Effect of SpecSwap for Purposes of Improving WER Performance of the Western Dutch Region Using the JASMIN-CGN Dataset

Marinov, A.

Evaluating the Effect of SpecSwap for Purposes of Improving WER Performance of the Western Dutch Region Using the JASMIN-CGN Dataset

Bachelor thesis (2022)

Authors

A. Marinov Electrical Engineering, Mathematics and Computer Science

Contributors

Odette Scharenborg (mentor)

Tanvina Patel (mentor)

Joana P. P. Gonçalves Pattern Recognition and Bioinformatics (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Speech recognition Speech augmentation Bias Bias reduction WER SpecSwap

To reference this document use:

http://resolver.tudelft.nl/uuid:a91366a7-5843-4ed2-baa0-b7f1b2dc57ef

More Info

expand_more

Published Date

22-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

A problem prevalent in many modern-day Automatic Speech Recognition (ASR) systems is the presence of bias and its reduction. Bias can be observed when an ASR system performs worse on a subset of its speakers compared to the rest rather than having the same overall generalization for everyone. This can be seen by using Word Error Rates (WER) as a metric. Depending on the ASR system in question the type of bias differs. However, techniques have been proposed and shown to succeed in reducing WER, and subsequently bias, by the use of data augmentation techniques for the recorded speech. These techniques perturb the audio in a certain way. Afterward, it is added to a model's training set and the model is retrained with the added data. One such technique is SpecSwap. This paper explores how using SpecSwap affects the WER performance of a hybrid-model ASR system using the JASMIN-CGN dataset's West-Dutch region. For comparison, a state-of-the-art data augmentation technique, VTLP, was also used, which has been shown to be effective in other cases. The experiments both led to a consistent WER increase. Therefore it was concluded that the data provided for the region was too little for the augmentation policy to be effective in any of the subcategories or in the overall performance of the system. However, SpecSwap shows potential in mitigating the widely discussed gender bias in ASR systems by reducing the difference between male and female speakers' WER.

Files

A.Marinov_RP_Paper_Final.pdf

(pdf | 1.67 Mb)

Unknown license