Unsupervised 3D object detection methods can reduce the reliance on human-annotations by leveraging raw sensor data directly for supervision. Recent approaches combine density-based spatial clustering with motion and appearance cues to extract object proposals from the scene, whi
...
Unsupervised 3D object detection methods can reduce the reliance on human-annotations by leveraging raw sensor data directly for supervision. Recent approaches combine density-based spatial clustering with motion and appearance cues to extract object proposals from the scene, which serve as pseudo-annotations. However, density-based methods struggle with the uneven data densities seen in LiDAR point clouds, and fail to distinguish between foreground and background objects effectively. To address this issue, this thesis introduces MobileClusterNet, a learnable framework designed for 3D spatial clustering. MobileClusterNet incorporates a novel loss module which utilizes appearance embeddings alongside scene flow information, thereby learning to generate high-quality clusters consisting of both static and dynamic mobile objects. Annotations generated by MobileClusterNet can be used for training any existing supervised detector, without the need for extensive self-training. Experimental results on the Waymo Open Dataset demonstrate that MobileClusterNet outperforms traditional density-based methods like HDBSCAN in clustering performance by a large margin, and provides high quality proposals for training supervised detectors.