Differentiable Product Quantization for Memory-Efficient Camera Relocalization

ECCV 2024

1VRG, FEE, Czech Technical University in Prague, 2Aalto University, 3CIIRC, Czech Technical University in Prague, 4University of Oulu
*Denotes equal contribution

TL;DR: We propose a scene-specific auto-encoder (D-PQED) that performs local image descriptor quantization/dequantization in a differentiable manner, achieving accurate visual localization performance under a very limited memory budget.

Abstract

Camera relocalization relies on 3D models of the scene with a large memory footprint that is incompatible with the memory budget of several applications. One solution to reduce the scene memory size is map compression by removing certain 3D points and descriptor quantization. This achieves high compression but leads to performance drop due to information loss. To address the memory performance tradeoff, we train a light-weight scene-specific auto-encoder network that performs descriptor quantization-dequantization in an end-to-end differentiable manner updating both product quantization centroids and network parameters through back-propagation. In addition to optimizing the network for descriptor reconstruction, we encourage it to preserve the descriptor-matching performance with margin-based metric loss functions. Results show that for a local descriptor memory of only 1MB, the synergistic combination of the proposed network and map compression achieves the best performance on the Aachen Day-Night compared to existing compression methods.

Method

In this work we use differentiable Product Quantization to perform memory-efficient camera relocalization. More specifically, a set of local image descriptors extracted from an image is fed into an encoder parameterized by the M codebooks that are used to obtain a quantized representation of the input vectors. The quantized descriptors are then passed into the scene-specific differentiable decoder that can recover the original descriptors that are susequently used in the localization pipeline. The encoder and decoder together represent a layer called D-PQED.

Results

We compare localization performance of the proposed approach with state-of-the-art methods on the Aachen Day-Night, 7Scenes, and Cambridge Landmarks datasets.

The proposed method outperforms other map compression methods in terms of localization accuracy and memory budget. The best and second best results are marked with bold and underline. The map compression methods are highlighted in italic.

BibTeX

@inproceedings{Laskar2024dpqed,
      author    = {Laskar, Zakaria and Melekhov, Iaroslav and Benbihi, Assia and Wang, Shuzhe and Kannala, Juho},
      title     = {Differentiable Product Quantization for Memory Efficient Camera Relocalization},
      journal   = {European Conference on Computer Vision (ECCV)},
      year      = {2024},
}

Acknowledgements

This work was supported by Programme Johannes Amos Comenius CZ.02.01.01/00/22_010/0003405 and the Czech Science Foundation (GACR) EXPRO (grant no. 23-07973X). JK acknowledges funding from the Academy of Finland (grant No. 327911, 353138) and support by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Allice Wallenberg Foundation. We acknowledge OP VVV funded project CZ.02.1.01/ 0.0/0.0/16_019/0000765 “Research Center for Informatics”, CSC – IT Center for Science, Finland, and the Aalto Science-IT project for computational resources.