Interpreting Deep Learning Models for Traffic Forecast
A Case Study of Unet
More Info
expand_more
Abstract
Deep learning (DL) models have shown strong predictive power in solving traffic problems in the past few years. Due to their lack of interpretability and transparency, applications of such models are sometimes controversial. To ensure trust in the model, it is crucial for model end-users and decision-makers to understand how the spatiotemporal dynamics of traffic states are captured and how contextual factors are utilized by the model. Inspired by feature attribution approaches in the Explainable Artificial Intelligence field, we propose a pipeline to interpret how the UNet model, a winning DL model in the image-based short-term traffic forecast competition, learns various dependencies from data for traffic forecast. By extending the classical saliency map method, we obtain an attribution value for each element in the input data to understand its influence on the prediction. Moreover, we propose multiple aggregations over the raw pixel-level attribution results, enhancing their interpretability for traffic forecast applications. Our results show that the most recent timestamp consistently contributes the most to predicting both traffic volume and speed. The global and local spatial dependencies captured by the UNet model suggest a pattern akin to the road network, yet they differ between urban and rural contexts. Finally, we compare the contribution of static contextual factors and dynamic features. Although they share similar spatial influential boundaries, contextual factors contribute less than speed and volume features. This study provides a systematic analytical pipeline to understand how a DL model utilizes the spatial, temporal, and contextual dependencies in traffic data for traffic forecast, assisting in decision-making for intelligent transportation systems.