O.E. Scharenborg | TU Delft Repository

Using articulated speech EEG signals for imagined speech decoding

Conference paper (2024) - Chris Bras (author), T.B. Patel (author), Tanvina Patel (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Brain-Computer Interfaces (BCIs) open avenues for communication among individuals unable to use voice or gestures. Silent speech interfaces are one such approach for BCIs that could offer a transformative means of connecting with the external world. Performance on imagined speech ...

Finding Spoken Identifications

Using GPT-4 Annotation For An Efficient And Fast Dataset Creation Pipeline

Conference paper (2024) - Maliha Jahan (author), Helin Wang (author), Thomas Thebaud (author), Yinglun Sun (author), Giang Le (author), Zsuzsanna Fagyal (author), O.E. Scharenborg (author), Odette Scharenborg (author), Mark Hasegawa-Johnson (author), Laureano Moro-Velázquez (author), Laureano Moro-Velazquez (author), Najim Dehak (author)

The growing emphasis on fairness in speech-processing tasks requires datasets with speakers from diverse subgroups that allow training and evaluating fair speech technology systems. However, creating such datasets through manual annotation can be costly. To address this challenge ...

Improving End-to-End Models for Children’s Speech Recognition

Journal article (2024) - T.B. Patel (author), Tanvina Patel (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scenario that no children’s speech data is available ...

Improving child speech recognition with augmented child-like speech

Conference paper (2024) - Y. Zhang (author), Z. Yue (author), T.B. Patel (author), Tanvina Patel (author), O.E. Scharenborg (author), Odette Scharenborg (author)

State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied child-to-child voice conversion (VC) from existing child speakers in the dataset and additional (new) c ...

BIAS in Flemish automatic speech recognition

Conference paper (2023) - Aaricia Herygers (author), Vass Verkhodanova (author), Matt Coler (author), O.E. Scharenborg (author), Odette Scharenborg (author), Munir Georges (author), Munir Georges (author)

Research has shown that automatic speech recognition (ASR) systems exhibit biases against different speaker groups, e.g., based on age or gender. This paper presents an investigation into bias in recent Flemish ASR. Seeing as Belgian Dutch, which is also known as Flemish, is ofte ...

Improving Adaptive Learning Models Using Prosodic Speech Features

Conference paper (2023) - Thomas Wilschut (author), Florian Sense (author), O.E. Scharenborg (author), Odette Scharenborg (author), Hedderik van Rijn (author)

Cognitive models of memory retrieval aim to describe human learning and forgetting over time. Such models have been successfully applied in digital systems that aid in memorizing information by adapting to the needs of individual learners. The memory models used in these systems ...

Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation

Conference paper (2023) - Zhaofeng Lin (author), T.B. Patel (author), Tanvina Patel (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data le ...

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge

Audio-Visual Diarization And Recognition

Conference paper (2023) - Zhe Wang (author), Shilong Wu (author), Diyuan Liu (author), More Authors..., Hang Chen (author), Hang Chen (author), Mao-Kui He (author), Jun Du (author), Chin-Hui Lee (author), Jingdong Chen (author), Shinji Watanabe (author), Sabato Marco Siniscalchi (author), Sabato Marco Siniscalchi (author), O.E. Scharenborg (author), Odette Scharenborg (author)

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 ch ...

DAIS: The Delft Database of EEG Recordings of Dutch Articulated and Imagined Speech

Conference paper (2023) - Bo Dekker (author), A.C. Schouten (author), Alfred Schouten (author), Alfred C. Schouten (author), A. C. Schouten (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Silent speech interfaces could enable people who lost the ability to use their voice or gestures to communicate with the external world, e.g., through decoding the person’s brain signals when imagining speech. Only a few and small databases exist that allow for the development an ...

Towards inclusive automatic speech recognition

Journal article (2023) - S. Feng (author), B.M. Halpern (author), B.M. Halpern (author), B.M. Halpern (author), Bence M. Halpern (author), Bence M. Halpern (author), Bence M. Halpern (author), O. Kudina (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition (ASR) systems do not perform equally well for all speaker groups. Many factors can cause this bias against different speaker groups. This paper, for the first time, systematically quantifi ...

AnyoneNet

Synchronized Speech and Talking Head Generation for Arbitrary Persons

Journal article (2023) - X. Wang (author), X. Wang (author), X. Wang (author), Qicong Xie (author), Lei Xie (author), Jihua Zhu (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Automatically generating videos in which synthesized speech is synchronized with lip movements in a talking head has great potential in many human-computer interaction scenarios. In this paper, we present an automatic method to generate synchronized speech and talking-head videos ...

Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners

Journal article (2023) - B.M. Halpern (author), B.M. Halpern (author), B.M. Halpern (author), Bence M. Halpern (author), Bence M. Halpern (author), Bence M. Halpern (author), S. Feng (author), Rob van Son (author), Rob van Son (author), Rob J.J.H. van Son (author), Rob J.J.H. van Son (author), Michiel W.M. van den Brekel (author), Michiel W.M. van den Brekel (author), Odette Scharenborg (author), O.E. Scharenborg (author)

In this paper, we build and compare multiple speech systems for the automatic evaluation of the severity of a speech impairment due to oral cancer, based on spontaneous speech. To be able to build and evaluate such systems, we collected a new spontaneous oral cancer speech corpus ...

Neural representations of non-native speech reflect proficiency and interference from native language knowledge

Journal article (2023) - Christian Brodbeck (author), Katerina Danae Kandylaki (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from repr ...

Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech

Conference paper (2023) - Y. Zhang (author), Aaricia Herygers (author), T.B. Patel (author), Tanvina Patel (author), Z. Yue (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Automatic speech recognition (ASR) should serve every speaker, not only the majority “standard” speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a “non-standard” or “diverse” way is crucial. We aim to mitigate the bi ...

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Journal article (2022) - Luke Prananta (author), B.M. Halpern (author), B.M. Halpern (author), B.M. Halpern (author), Bence M. Halpern (author), Bence M. Halpern (author), Bence M. Halpern (author), S. Feng (author), O.E. Scharenborg (author), Odette Scharenborg (author)

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigo ...

The Presence of Background Noise Extends the Competitor Space in Native and Non-Native Spoken-Word Recognition

Insights from Computational Modeling

Journal article (2022) - Themis Karaminis (author), Florian Hintz (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Oral communication often takes place in noisy environments, which challenge spoken-word recognition. Previous research has suggested that the presence of background noise extends the number of candidate words competing with the target word for recognition and that this extension ...

Comparing data augmentation and training techniques to reduce bias against non-native accents in hybrid speech recognition systems

Conference paper (2022) - Yixuan Zhang (author), Y. Zhang (author), T.B. Patel (author), Tanvina Patel (author), O.E. Scharenborg (author), Odette Scharenborg (author)

One important problem that needs tackling for wide deployment of Automatic Speech Recognition (ASR) is the bias in ASR, i.e., ASRs tend to generate more accurate predictions for certain speaker groups while making more errors on speech from other groups. We aim to reduce bias aga ...

Mitigating bias against non-native accents

Journal article (2022) - Y. Zhang (author), Yixuan Zhang (author), Bence M. Halpern (author), Bence M. Halpern (author), Bence M. Halpern (author), B.M. Halpern (author), B.M. Halpern (author), B.M. Halpern (author), Tanvina Patel (author), T.B. Patel (author), O.E. Scharenborg (author), Odette Scharenborg (author)

Automatic speech recognition (ASR) systems have seen substantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists against different types of speech, including non-native accents, in state-of-the-art (SOTA) ASR systems. T ...

Audio-Visual Wake Word Spotting in MISP2021 Challenge

Dataset Release and Deep Analysis

Journal article (2022) - Hengshun Zhou (author), Jun Du (author), Gongzhen Zou (author), Zhaoxu Nian (author), Chin-Hui Lee (author), Sabato Marco Siniscalchi (author), Sabato Marco Siniscalchi (author), Shinji Watanabe (author), O.E. Scharenborg (author), Odette Scharenborg (author), Jingdong Chen (author), More Authors...

In this paper, we describe and release publicly the audio-visual wake word spotting (WWS) database in the MISP2021 Challenge, which covers a range of scenarios of audio and video data collected by near-, mid-, and far-field microphone arrays, and cameras, to create a shared and p ...

Using cross-model learnings for the Gram Vaani ASR Challenge 2022

Journal article (2022) - T.B. Patel (author), Tanvina Patel (author), O.E. Scharenborg (author), Odette Scharenborg (author)

In the diverse and multilingual land of India, Hindi is spoken as a first language by a majority of its population. Efforts are made to obtain data in terms of audio, transcriptions, dictionary, etc. to develop speech-technology applications in Hindi. Similarly, the Gram-Vaani AS ...