Fusing Bird’s Eye View Map Encoding With Simulated Sounds for Generalizable Non-Line-Of-Sight Vehicle Detection

More Info
expand_more

Abstract

Detecting nearby vehicles involves utilizing data from various sensors installed on a car as it moves. Common sensors for identifying nearby vehicles include LiDAR, cameras, and RADAR. However, all of these sensors suffer from the same issue -- they cannot detect an approaching vehicle that is not yet visible. Hence, this thesis explores the potential of using a microphone array -- an array of sensors capable of detecting vehicles that are out of sight. Exploring prior research on detecting obstructed vehicles using sound reveals an existing model capable of detecting nearby vehicles approaching from behind blind corners. However, as the local geometry around the ego vehicle affects the perceived sound patterns, this model was only designed to work within a specific set of T-junctions. Therefore, the thesis aims to take a step further and develop a detection model capable of detecting vehicles behind blind corners in environments not included in the training set of the deployed model. This is challenging for multiple reasons. First, literature review revealed a lack of suitable datasets comprising sounds from approaching vehicles behind blind corners within various road junctions. In addition, microphones, like other sensors, come with limitations. Sound inherently provides less spatial information compared to commonly used sensors in autonomous driving, such as LiDAR or cameras. Considering sound propagation variations in different road junction geometries, building a model adaptable across diverse junction types presents a challenge. To overcome the data scarcity and sound's inherent spatial limitations, the study investigates the potential of employing simulated acoustic responses within artificial road environments as training data for real-world vehicle detection. Simultaneously, to complement the sounds inherent advantage of detecting objects that are out of sight, the thesis proposes to use a Bird's Eye View (BEV) encoding of the top-down map from the driving vehicle's perspective. Having an encoding of the top-down map of the current driving environment would allow a detection model to expect sound signatures commonly observed within a given setting. Overall, the assessment of acoustic simulations could not outline a singular configuration of simulation properties allowing realistic sound propagation for any kind of considered junctions when hearing an approaching vehicle. However, it was observed that the utilization of specific simulation parameters can result in realistic sound propagation within the given junction. Subsequently, evaluating a novel BEV encoding within the newly proposed acoustic detection pipeline demonstrated either equivalent or superior performance compared to a model relying solely on sound. Overall, this research underscores the potential of incorporating BEV encoding in non-line-of-sight acoustic detection and suggests the promise of acoustic simulations within the field. This study contributes to advancing the integration of sound as an additional data modality in vehicle detection.