Domain shift-aware Ensemble-based Visual Place Recognition

de Leeuw, W.F.

Abstract

VPR describes a task where an agent (e.g., a robot) attempts to recognize its current location by comparing the incoming visual data from its sensor(s) (query images), usually a camera, to geotagged reference images. Both query and reference images are described using a feature extractor, and the query descriptor is matched to its closest reference descriptor in the feature space. Within VPR there are many different VPR techniques that have been proposed throughout the years with many different types of architecture and trained on different datasets. With the many test datasets available, there exists no VPR technique that is able to reach state-of-the-art performance on all these datasets. For this reason, existing work has argued it can be beneficial to utilize an ensemble-based method to combine multiple VPR techniques and achieve better VPR performance. Some of these Ensemble-based methods have already been proposed. These ensemble-methods combine individual VPR techniques and weigh their predictions using these same predictions to give an indication of their confidence. This calculation, however, is strictly based on predictions obtained from applying the VPR techniques on test data at inference time. Generally within VPR research, the dataset that was used to train the VPR technique is often different from the dataset it is tested on. This means there is a domain shift between the training and test data. This domain shift is not taken into account when weighting the predictions of VPR techniques in an ensemble using these existing methods. In this work, we analyze how this degree of domain shift between train and test data, which can be observed by looking at the relative location of descriptors in the feature space, impacts downstream VPR performance. Intuitively, one would expect better VPR performance in a situation where the degree of train-test domain shift is minimal. Our analysis shows that this is indeed the case. We propose two different methods that utilize this degree of domain shift to calculate the weights given to the VPR techniques in an ensemble. First, we propose a generative method. Here weights are given to the VPR techniques based on the likelihood that the query sample originated from the same distribution as the training dataset of the technique and is in distribution. This way each individual technique is given a weight. Secondly, we propose a discriminative method. Here weights are given to the training datasets used to train the techniques in the ensemble. These training datasets are given weights based on relative proximity to a query sample in the feature space, an indicator for the degree of domain-shift between the training dataset and the query sample. all VPR techniques are given the weight corresponding to their training dataset. We compare these proposed approaches to other ensemble-based baselines and individual VPR techniques. The quantitative results show that our proposed methods generally outperform the ensemblebased baselines and the individual VPR techniques. We also propose further future work. One of the generative methods still delivers lower performance than could be possible, caused by applying this method to high-dimensionality descriptors. A solution for this issue should lead to higher VPR performance using this method. Additionally, we suggest future work to expand on the datasets used in this research, to strengthen the claims made, and verify that results and trends found to hold up when testing and training using other datasets.

Domain shift-aware Ensemble-based Visual Place Recognition

Abstract

Files