BaSAL: Class Balanced Warm Start Active Learning for LiDAR Semantic Segmentation
More Info
expand_more
Abstract
Active learning has been proposed as a solution to mitigate the expensive and time-consuming process of annotating large-scale autonomous driving datasets. The process typically involves a model initialization phase, followed by multiple iterations aiming at selecting the most informative data based on the initial model. However, we find two problems that have not explicitly been solved in this process. First, current large-scale autonomous driving datasets suffer from the class imbalance problem, yet no strategy has been specifically designed to address this issue. Second, selecting the initial data from an entirely unlabeled pool of data, commonly referred to as the cold start problem of active learning, remains challenging. In this study, we propose a Class Balanced Warm Start Active Learning for LiDAR Semantic Segmentation (BaSAL) framework to address these problems. Our framework introduces a novel size-based clustering pipeline that uses a size-based cluster for non-ground points or a grid for ground points as a basic query unit. The cluster sizes are heavily correlated with their semantic classes, allowing us to more actively control the class distribution of the selected data. We also propose a warm start strategy to alleviate the cold start problem. Different from the commonly used random point cloud scan selection for model initialization, our warm start strategy selects data from the basic query units and can improve the initial model by a large margin. Experiments show that our approach can achieve over 95% of the performance
of fully supervised learning while using only 5% of data, outperforming existing active learning methods on SemanticKITTI [4] and getting on par performance with the state-of-the-art method on nuScenes [7].