CNN-based Roofing Material Segmentation using Aerial Imagery and LiDAR Data Fusion

More Info
expand_more

Abstract

Roofing material classification is becoming pivotal in urban decision-making, supporting processes like asbestos mapping, disaster preparation, and urban heat island detection. However, research has mainly focused on a restricted set of materials. Additionally, many studies rely on expensive multi- or hyper-spectral imagery and often use outdated or ineffective classification methods, overlooking the transformative capabilities of advanced deep learning and data fusion. This thesis explores using a convolutional neural network for pixel-level classification of Dutch buildings by merging standard aerial images with LiDAR data, enabling detailed mapping and addressing concerns about classification efficacy relative to image- and object-level methods.

A framework was devised to generate a new semantic segmentation dataset with over 15.5 million pixels from 200 randomly selected images nationwide, covering eight distinct materials. To facilitate material identification in unfavourable lighting conditions, true-colour aerial imagery from the BM5 dataset was combined with rasterised features extracted from the national point cloud (AHN4), specifically reflectance, slope, and planar point density. Additionally, a quasi-normalised elevation model (nDRM) was employed, based on the corresponding digital surface model and median roof elevation of buildings in each scene, as provided by the 3DBAG dataset. The research was further investigated using the DeepLabv3+ semantic segmentation architecture with a ResNet-18 backbone, and the model was trained end-to-end on the generated dataset. In this context, a novel stratified splitting algorithm and weighting scheme to combat class imbalance in the training subset were introduced.

After thorough hyperparameter tuning, we achieved a 64.68% mean intersection over union on the test subset. Membranes and gravel outperformed almost every other study. However, there were notable confusion and omission errors with light-permitting surfaces and metal. Further testing of pixel-wise material maps' generalization to different LoDs of the 3DBAG considerably decreased gross errors. However, it might overlook some minor original predictions, thus not improving overall performance notably. Generally, LoD1.2 was inadequate for modelling multi-material roofs of different heights. While LoD1.3 improved this, it still missed small roof sections, unlike LoD2.2, which also had more outliers. Additionally, an ablation study on the LiDAR-derived component of the new dataset showed that removing slope and nDRM reduced performance by 10.31% and 8.61%, respectively, while density had the least impact. All ablated features were semantically linked, suggesting they should be combined into a single dataset.

The thesis showcases the relevance of pixel-based classification with DL and data fusion, providing resources for future research and indicating areas for dataset expansion and improved annotation.