LeAP: Label any Pointcloud in any domain using Foundation Models

More Info
expand_more

Abstract

3D semantic understanding is essential for a wide range of robotics applications. Availability of datasets is a strong driver for research, and whilst obtaining unlabeled data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. Recently, foundation models have facilitated open-set semantic segmentation, potentially aiding automatic labeling. However, these models have largely been limited to 2D images. This work introduces Label Any Pointcloud (LeAP), which leverages 2D Vision Foundation Models (VFMs) to automatically label 3D data with any set of classes in any kind of application. VFMs are used to create image labels for the desired classes which are then projected to 3D points. Using the Bayesian update, point-wise labels are combined into voxels to improve label consistency, and label points outside the camera fustrum. A novel Cross-Modal Self-Training (CM-ST) approach further enhances label quality. Through extensive experiments, we demonstrate that our method can generate high-quality 3D semantic labels across diverse fields without any manual 3D labeling. Models adapted to new application domains using our labels show up to 3.7× (12.9 → 47.1) mIoU improvement compared to the unadapted baselines. This ability to provide labels for any domain can help accelerate 3D perception research.

Files

Thesis_Paper_Simon_Gebraad.pdf
Unknown license
warning

File under embargo until 09-09-2026