LeAP: Label any Pointcloud in any domain using Foundation Models

Master thesis (2024)

Authors

J.S. Gebraad Mechanical Engineering

Contributors

A. Palffy Intelligent Vehicles - Mechanical, Maritime and Materials Engineering (mentor)

Holger Caesar Intelligent Vehicles - Mechanical, Maritime and Materials Engineering (graduation committee member)

Faculty

Mechanical Engineering

To reference this document use:

http://resolver.tudelft.nl/uuid:70e3fd51-8d47-4a28-bb88-ab79e967b409

More Info

expand_more

Published Date

10-09-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

3D semantic understanding is essential for a wide range of robotics applications. Availability of datasets is a strong driver for research, and whilst obtaining unlabeled data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. Recently, foundation models have facilitated open-set semantic segmentation, potentially aiding automatic labeling. However, these models have largely been limited to 2D images. This work introduces Label Any Pointcloud (LeAP), which leverages 2D Vision Foundation Models (VFMs) to automatically label 3D data with any set of classes in any kind of application. VFMs are used to create image labels for the desired classes which are then projected to 3D points. Using the Bayesian update, point-wise labels are combined into voxels to improve label consistency, and label points outside the camera fustrum. A novel Cross-Modal Self-Training (CM-ST) approach further enhances label quality. Through extensive experiments, we demonstrate that our method can generate high-quality 3D semantic labels across diverse fields without any manual 3D labeling. Models adapted to new application domains using our labels show up to 3.7× (12.9 → 47.1) mIoU improvement compared to the unadapted baselines. This ability to provide labels for any domain can help accelerate 3D perception research.

Files

Thesis_Paper_Simon_Gebraad.pdf

Unknown license

File under embargo until 09-09-2026