Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Overview

TL;DR: We use contrastive learning to compute local features, from which the similarity between a real image and a synthetic template can be predicted. We also introduce a new similarity measure that explicitly takes into account the object’s mask in the template and the possible occlusions in the query image. We show experimentally that our method, based on local feature, has much better properties and can be made robust to occlusions.

Abstract

We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. Our method requires neither a training phase on these objects nor real images depicting them, only their CAD models. It relies on a small set of training objects to learn local object representations, which allow us to locally match the input image to a set of “templates”, rendered images of the CAD models for the new objects. In contrast with the state-of-the-art methods, the new objects on which our method is applied can be very different from the training objects. As a result, we are the first to show generalization without retraining on the LINEMOD and Occlusion-LINEMOD datasets. Our analysis of the failure modes of previous template-based approaches further confirms the benefits of local features for template matching. We outperform the state-of-the-art template matching methods on the LINEMOD, Occlusion-LINEMOD and T-LESS datasets.

Qualitative results

Qualitative results on LM-O (left) and T-LESS (right) datasets. The first column shows the input testing images. The second column shows the ground truth poses. The two last column shows the results of our method.

BibTeX

@inproceedings{nguyen2022templates,
        title     = {Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions},
        author    = {Nguyen, Van Nguyen and Hu, Yinlin and Xiao, Yang and Salzmann, Mathieu and Lepetit, Vincent},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        pages     = {6771--6780},
        year      = {2022}
      }

Further information

If you like this project, check out our works on novel object segmentation / pose estimation: