Localization is a problem of ’where we are’. Localization techniques help people understand their surrounding environment based on extracted position information in a geographic reference map. The development of global navigation satellite system (GNSS), light detection and rangi
...
Localization is a problem of ’where we are’. Localization techniques help people understand their surrounding environment based on extracted position information in a geographic reference map. The development of global navigation satellite system (GNSS), light detection and ranging (LiDAR), computer vision (CV), etc., enables us to apply localization techniques for more specific tasks. Autonomous driving or robotics needs a reliable localization technique that both instantly retrieves an accurate positioning result and detailed information of the environment, which is challenging to realize in an urban environment. In our research, we propose a 3D point cloud localization framework in the urban environment to realize localization of terrestrial laser scanning (TLS) static scans in mobile laser scanning (MLS) point clouds. 3D point cloud localization consists of two steps: place recognition which aims to find the location of a TLS point cloud in a big MLS point cloud and pose refinement which accurately aligns TLS and corresponding MLS point clouds. In place recognition, about 100 cylinder-like objects with 2D coordinate information are retrieved per million points in a TLS scan or a MLS scene, taking nearly 15 seconds. 5 TLS static scans in the experiment are all successfully matched to corresponding MLS scenes by the Gaussian mixture model (GMM) probabilistic estimation based on extracted cylinder-like objects. Results indicate that it is possible to realize place recognition by designing an object feature descriptor instead of a fine feature descriptor, even though TLS and MLS point clouds are collected at different times (2016 and 2020). A cylinder-like object feature descriptor saves time, compared with taking more than 100 seconds per million points during point feature extraction. In pose refinement, point clouds of TLS scans and corresponding MLS scenes are resampled and inputted to a neural network which consists of feature extraction block (FEB), correspondence search block (CSB) and pose estimation block (PEB). The output of the neural network is tuned by the global refinement process to realize pose refinement. The accuracy (£r , £t ), expressed as the mean rotation error £r in degrees (deg) and the mean translation error £t in meters (m), is (0.25 deg, 0.88 m) if the ground truth varies from (0, 0, 0) to (10 deg, 10 deg, 30 deg) in rotation and (0, 0, 0) to (30 m, 30 m, 10 m) in translation w.r.t. the (x, y, z) axis. The accuracy of a point-based registration neural network and traditional registration methods point-to-point iterative closest point (ICP), coherent point drift (CPD) is (∼10 deg, ∼10 m), (0.27 deg, 0.95 m) and (0.40 deg, 1.94 m), respectively. The run time of processing a million points for our neural network, a point-based neural network, point-to-point ICP and CPD is ∼5 seconds, ∼3 seconds, ∼50 seconds and >100 seconds, respectively. Results prove that current point-based neural networks cannot work in an urban area, but our neural network achieves a more accurate result than some traditional registration methods. A neural network for pose refinement does not need prior information and is more efficient and more general than traditional registration methods as well. Overall, our research contributes to a general framework of 3D point cloud localization, incorporating both a traditional feature extraction method and a novel neural network. We hope this localization framework can be extended to other scenarios more than the urban environment, as well as other further applications e.g. urban city reconstruction.