Blind Reverberation Time Estimation using A Convolutional Neural Network with Encoder

Bachelor thesis (2024)

Authors

X. Han Electrical Engineering, Mathematics and Computer Science

Contributors

Jorge Martinez (mentor)

Dimme de Groot (mentor)

M.S. Pera Web Information Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:6957b936-be7a-4581-beea-4166bdff557c

More Info

expand_more

Published Date

27-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Estimating reverberation time (RT60) accurately is crucial for enhancing the acoustic quality of various environments as it decides how you feel the sound fades away subjectively. Traditional methods, such as Sabine's equation, require extensive prior knowledge and assume ideal conditions, limiting their practicality. To address these limitations, this paper explores the application of convolutional neural networks (CNNs) enhanced with an encoder architecture based on transformer mechanisms for blind RT60 estimation. The proposed model leverages simulated and real-world datasets, incorporating environmental noise to improve robustness. Results indicate that the CNN-Encoder model achieves superior performance, with a mean squared error (MSE) as low as 0.0006 seconds for pure room impulse responses (RIRs) and 0.0011 seconds under +30dB signal-to-noise ratio (SNR) conditions. It also demonstrates potential in practical usage achieving an MSE of 0.0282 seconds under audio recordings. This approach offers a significant reduction in estimation error compared to the CNN-only architecture, demonstrating the potential for improved acoustic parameter estimation in varied environments. Future work will focus on further optimizing the model for real-world applications and reducing computational complexity while maintaining high accuracy.

Files

Thesis_final_version.pdf

(pdf | 0.356 Mb)

Unknown license