Driving is a challenging task. When people operate vehicles they utilize all their senses to assess the current traffic scenario and determine appropriate actions to take. Sensors in autonomous driving applications aim to mimic those human senses to build a similar understanding
...
Driving is a challenging task. When people operate vehicles they utilize all their senses to assess the current traffic scenario and determine appropriate actions to take. Sensors in autonomous driving applications aim to mimic those human senses to build a similar understanding of these complex circumstances. Most scientific attention in the autonomous driving community has been put on camera and LiDAR solutions, which are already capable of constructing a cohesive three-dimensional representation of the surroundings with high accuracy. Instead, minimal research went into the utilization of the auditory landscape in traffic scenarios. With the use of sirens in emergency vehicles and upcoming regulations of minimum sound emissions for electric vehicles, it appears evident that acoustic perception deserves more attention. Auditory localization in particular is a task widely studied in the field of speaker recognition, where robust methods exist that can localize and track multiple speech sources in difficult acoustic environments. This raises the question of whether such principles can be applied in the domain of autonomous driving and help to make self-driving cars safer and more robust in navigating dense and complex urban cities. Especially in non-line-of-sight situations where oncoming traffic can be hazardous and conventional systems fail, employing acoustic sensing may prove instrumental in early detection and localization attempts. This study investigates, if, and to what extend, acoustic methods can be used to complement ordinary sensing and localization methods. Using the example of a T-crossing, where traffic is occluded by buildings, it is shown that acoustic sensing alone is capable of detecting oncoming traffic before it enters direct line-of-sight. To achieve this, a combination of a generic acoustic line-of-sight feature and two concepts of data-driven classification methods are used to infer from the surrounding soundfield at the ego-vehicle to other motorized vehicles in proximity. This investigation is aided by a supplementary real-world dataset that provides around two hours of data, gathered from five different locations. The performance of the methods is directly compared to a visual baseline and other acoustic line-of-sight methods. The results demonstrate that the proposed acoustic localization concepts can detect traffic about one second earlier than conventional line-of-sight sensors. Despite the complexity of the problem, it is shown that the more lightweight method in terms of parameter count is favourable and more performant in most of the tasks. Among others, these tasks include generalization across different environments and a larger time horizon.