Evaluating the Use of Pitch Shifting to Improve Automatic Speech Recognition Performance on Southern Dutch Accents

Mešić, A.

Evaluating the Use of Pitch Shifting to Improve Automatic Speech Recognition Performance on Southern Dutch Accents

Bachelor thesis (2022)

Authors

A. Mešić Electrical Engineering, Mathematics and Computer Science

Contributors

Tanvina Patel (mentor)

O.E. Scharenborg (mentor)

Joana P. P. Gonçalves Pattern Recognition and Bioinformatics (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

ASR Data Augmentation JASMIN-CGN Audio augmentation Bias Speech recognition Hybrid ASR Pitch Shift Dutch

To reference this document use:

http://resolver.tudelft.nl/uuid:f1f54596-adbc-436a-87ac-a40394689b92

More Info

expand_more

Published Date

22-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Building Automatic Speech Recognizers (ASRs) has been a challenge in languages with insufficiently sized corpora or data sets. A further large issue in language corpora is biases against regionally accented speech and other speaker attributes. There are some techniques to improve ASR performance and reduce biases in these corpora, known as data augmentations. One audio data augmentation, pitch shifting, has had successes in other experiments for increasing ASR performance. Pitch shifting it is tested in this paper on the JASMIN-CGN speech data set from the Southern regions of the Netherlands. Using a hybrid GMM-HMM ASR, two baselines are developed, one using all speech data from the region, the other only using native speech. For the former ASR, pitch shifting is found to not improve Word Error Rate (WER) performance or reduce bias, but the latter succeeds in improving WER performance and reduced bias for certain speaker groups when augmented.

Files

RP_Amar_final_v2.pdf

(pdf | 0.694 Mb)

Unknown license