Cross-Modal Re-identification of Persons between RGB and Depth

More Info
expand_more

Abstract

Cross-modal person re-identification is the task to re-identify a person which was sensedin a first modality, like in visible light (RGB), in a second modality, like depth. Therefore, the challenge is to sense between inputs from separate modalities, without information from both modalities at the same time step. Lately, the scientific challenge of cross-modal person re-identification between depth and RGB is getting more and more attention due to the needs of intelligent vehicles, but also interested parties in the surveillance domain, where sensing in poor illumination is desirable. Techniques for cross-modal person re-identification have to solve several concurrent tasks. First, techniques have to be robust against variations in the single modalities. Occurring challenges are viewpoint changes, pose variations or variations in camera resolution. Second, the challenge of re-identifying a person has to be solved across the modalities within a heterogeneous network of RGB and depth cameras. At the present day, work in cross-modal re-identification between infrared images and RGB images exist. At the same time almost no work was done in re-identification between depth images and visible light images. The objective of this work is to fill this gap by comparing the performance of different techniques for cross-modal re-identification of persons. The main contributions of this work are two-fold.
First, different deep neural network architectures for cross-modal re-identification of persons between depth and visible light are investigated and compared. Second, a new technique for cross-modal person re-identification is presented. Thet echnique is based on two-step cross-distillation and allows to extract similar features from the depth and visible light modality. Therefore, the task of matching persons sensed between depth and visible light is facilitated and can be solved with higher accuracy. Within the evaluation, it was possible to report state-of-the-art results for two relevant datasets for cross-modal person re-identification between depth and RGB. For the BIWI RGBD-ID dataset the pre-existing state-of-the-art was improved by more than 15% in mean average precision. Additionally, it was possible to validate the performance of the method with the RobotPKU dataset. Although the method was successfully applied in cross-modal person re-identification between depth and RGB, it was shown that in another modality combinations, like RGB and infrared, the technique in its current definition cannot be considered state-of-the-art. Finally, it is possible to give a lookout on the implications of the results for the intelligent vehicles domain. For a successful deployment in this area more thorough datasets have to be developed and the performance on sparse depth maps, as provided by lidars or radars, have to be investigated.