Estimating Reverberation Time by a Function of Intrusive Speech Intelligibility Measures
More Info
expand_more
Abstract
A Room Impulse Response (RIR) is a mathematical model for sound propagation in a room. Estimating RIR parameters such as the reverberation time (T60) allows Automatic Speech Recognition (ASR) systems to adapt to reverberation in input signals by changing their behavior based on these estimates. Currently, machine learning techniques provide the most accurate T60 estimations. We propose a novel methodology by using intrusive Speech Intelligibility Measures (SIMs) beyond their traditional application. In this study we utilize SIIB, SIIB^Gauss, STOI and ESTOI as SIMs. For each SIM we find a best fit curve with respect to the reverberation time (T60) using a statistical approach. The statistical analysis is applied on simulated RIRs obtained by using the Image Source Method. The estimator for SIIB^Gauss achieves the lowest Mean Squared Error of 0.353 on simulated data. Although this does not outperform state-of-the-art models, we offer recommendations for possible improvements. Preliminary experiments suggest that enhancing noise robustness is crucial and that the estimators could be generalized to real-world scenarios. However, further research is necessary to confirm this.