Learning from the unseen: Reducing train-test domain gaps by fine-tuning on reference images at test time
More Info
expand_more
Abstract
Visual place recognition (VPR) is a form of visual localization. Current approaches are designed to handle common VPR challenges, such as appearance and viewpoint variations. With the introduction of DINOv2, vision foundation models have been used as feature extractors to improve performance for VPR techniques, as they show great generalizing capabilities for image representations. By fine-tuning these large models on VPR-specific datasets, performance increases even more. A problem with these big VPR datasets is the bias towards urban environments. To solve this problem, we propose to use a simple pipeline to fine-tune existing techniques on the reference databases of test datasets. Our experiments show that performance improves by reference database fine-tuning for multiple techniques on different datasets. To handle appearance and viewpoint variations as well, image augmentations can be used during training. With this complete pipeline, techniques improve performance. The experiments show improvement even if a large query-reference domain gap exists for that dataset given that a part of the test queries are know during fine-tuning.