TL;DR: CNOS is a simple approach to segment novel objects in RGB images from only their CAD models. CNOS first uses Segmenting Anything to generate object's masks, then uses "CLS" token of DINOv2 to assign the class and confidence score.
We propose a simple yet powerful method to segment novel objects in RGB images from their CAD models. Leveraging recent foundation models, Segment Anything and DINOv2, we generate segmentation proposals in the input image and match them against object templates that are pre-rendered using the CAD models. The matching is realized by comparing DINOv2 cls tokens of the proposed regions and the templates. The output of the method is a set of segmentation masks associated with per-object confidences defined by the matching scores. We experimentally demonstrate that the proposed method achieves state-of-the-art results in CAD-based novel object segmentation on the seven core datasets of the BOP challenge, surpassing the recent method of Chen et al. by absolute 19.8% AP.
Qualitative results on LM-O, HB, and YCB-V datasets. The first column shows the input CAD models. In cases where there are more than 16 models, we only show the first 16 to ensure better visibility. The second column show the input RGB image and the last two columns depict the detections produced by CNOS (SAM) and CNOS (FastSAM) with confidence scores greater than 0.5. Interestingly, in the last row, even though the segmentation proposals in CNOS (SAM) and CNOS (FastSAM) are very similar, their final labels differ for a few objects.
Qualitative results on YCB-V dataset. We show the segmentation "per frame" (i.e no temporal constraints).
Demo. We provide the demo code to run CNOS on your custom objects. Please follow the instructions in our Github repo .
@inproceedings{nguyen2023cnos,
title = {CNOS: A Strong Baseline for CAD-based Novel Object Segmentation},
author = {Nguyen, Van Nguyen and Groueix, Thibault and Ponimatkin, Georgy and Lepetit, Vincent and Hodan, Tomas},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages = {2134--2140},
year = {2023}
}