Height Inference for all USA Building Footprints in the Absence of Height Data

More Info
expand_more

Abstract

In recent years, the demand for 3D spatial information and 3D city models has increased, as they support and allow many different applications, e.g. noise simulations, energy demand estimations, and shadow analysis. Constructing a city model with 3D buildings requires elevation data (such as LiDAR or Digital Terrain Models), but unfortunately, data of sufficient quality is often unavailable. This thesis focuses on the use of machine learning methods to estimate the height of building footprints and thus bypassing the use of elevation data completely. Three different methods are tested and compared: Random Forest Regression (RFR), Multiple Linear Regression (MLR), and Support Vector Regression (SVR). A case study is performed for the conterminous United States of America (USA) because of its availability of a nation-wide building dataset, containing roughly 125 million building footprints. The high diversity in urban layouts is considered, where a distinction is made between Central Business Districts (CBDs) in cities and all other regions (e.g. suburbs and rural areas). All building footprints are characterised by nine features derived from their geometry, which are then used (in several combinations) in the model training and predicting stages. Furthermore, the influence of additional features – including census and cadastral data – on the results of the building height predictions is analysed for the city of Denver, Colorado. The experiments show that it is feasible to predict the height for all buildings in the conterminous USA in under 6 minutes. Both the MLR and SVR method even accomplish it in under 30 seconds. The height prediction results show that the different prediction models struggle to accurately estimate the height for buildings in CBDs. The lowest achieved Mean Absolute Error (MAE) is 31.81m, whereas for the suburban and rural areas it is 1.41m. Adding additional, non-geometric features (e.g. census data) to the prediction models for one city (Denver) proved to be successful; the RFR method reduced its MAE from 1.35m to 0.96m for the suburbs, achieving sub-metre accuracy. The CBDs, however, are still problematic with an MAE of 16.87m. These results show that for the suburban and rural areas, the accuracy recommendations from the CityGML specifications for LOD1 models can be met (5m limit). For the CBDs, improvement is required. The experiments also proved that the proposed methodology can be used to generate 3D city models of very large datasets if no elevation data is available. Moreover, the method is, in theory, generic enough to be applied outside the USA.