Templates for 3D Object Pose Estimation Revisited:
Generalization to New Objects and Robustness to Occlusions
CVPR 2022


We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. Our method requires neither a training phase on these objects nor real images depicting them, only their CAD models. It relies on a small set of training objects to learn local object representations, which allow us to locally match the input image to a set of “templates”, rendered images of the CAD models for the new objects. In contrast with the state-of-the-art methods, the new objects on which our method is applied can be very different from the training objects. As a result, we are the first to show generalization without retraining on the LINEMOD and Occlusion-LINEMOD datasets. Our analysis of the failure modes of previous template-based approaches further confirms the benefits of local features for template matching. We outperform the state-of-the-art template matching methods on the LINEMOD, Occlusion-LINEMOD and T-LESS datasets.



Summary: We use contrastive learning to compute local features, from which the similarity between a real image and a synthetic template can be predicted. We also introduce a new similarity measure that explicitly takes into account the object’s mask in the template and the possible occlusions in the query image. We show experimentally that our method, based on local feature, has much better properties and can be made robust to occlusions.




  title={Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions},
  author={Nguyen, Van Nguyen and Hu, Yinlin and Xiao, Yang and Salzmann, Mathieu and Lepetit, Vincent},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


We thank Michaël Ramamonjisoa, Tom Monnier, Elliot Vincent and Romain Loiseau for valuable feedback. This research was produced within the framework of Energy4Climate Interdisciplinary Center (E4C) of IP Paris and Ecole des Ponts ParisTech. This research was supported by 3rd Programme d’Investissements d’Avenir [ANR-18-EUR-0006-02]. This action benefited from the support of the Chair “Challenging Technology for Responsible Energy" led by l’X – Ecole polytechnique and the Fondation de l’Ecole polytechnique, sponsored by TOTAL. This work has received funding from the CHISTERA IPALM project and was performed using HPC resources from GENCI–IDRIS 2021-AD011012294R1.

This website takes the template from Ben Mildenhall and Michaël Gharbi.