GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

1LIGM, Ecole des Ponts, 2Adobe, 3EPFL

Overview

overview

TL;DR: GigaPose is a "hybrid" template-patch correspondence approach to estimate 6D pose of novel objects in RGB images: GigaPose first uses templates, rendered images of the CAD models, to recover the out-of-plane rotation (2DoF) and then uses patch correspondences to estimate the remaining 4DoF. We experimentally show that GigaPose is (i) 38x faster for coarse pose stage, (ii) robust to segmentation errors made by the 2D detector, and (iii) more accurate with 3.2 AP improvement on seven core dataset of the BOP challenge.

Abstract

We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative templates, rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest neighbor search in feature space, results in a speedup factor of 38x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with a refinement method.

Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient.

Video

Qualitative results

qualitative

Qualitative results on LM-O and YCB-V datasets. The first column shows CNOS's segmentation. The second and third columns illustrate the outputs of the nearest neighbor search step, which includes the nearest template and the 2D-to-2D correspondences. The fourth column demonstrates the alignment achieved by applying the predicted affine transform to the template, then overlaying it on the query input. The last column show the final prediction after refinement.

BibTeX

@inproceedings{nguyen2024gigaPose,
        author    = {Nguyen, Van Nguyen and Groueix, Thibault and Salzmann, Mathieu and Lepetit, Vincent},
        title     = {{GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence}},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        year      = {2024}
      }

Further information

If you like this project, check out our works on novel object segmentation / pose estimation: