• Community
  • Model
  • general-image-detector-detic_clipR50Caption-coco

general-image-detector-detic_clipR50Caption-coco

--

Notes

Detecting Twenty-thousand Classes using Image-level Supervision

Detic: A Detector with image classes that can use image-level labels to easily train detectors.

Detecting Twenty-thousand Classes using Image-level Supervision,
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
ECCV 2022 (arXiv 2201.02605)

Features

  • Detects any class given class names (using CLIP).

  • We train the detector on ImageNet-21K dataset with 21K classes.

  • Cross-dataset generalization to OpenImages and Objects365 without finetuning.

  • State-of-the-art results on Open-vocabulary LVIS and Open-vocabulary COCO.

Detic_CLIP_R50_1x_caption-image Performance

Open-vocabulary COCO

NameTraining timebox mAP50box mAP50_novel
BoxSup_CLIP_R50_1x12h39.31.3
Detic_CLIP_R50_1x_image13h44.724.1
Detic_CLIP_R50_1x_caption16h43.821.0
Detic_CLIP_R50_1x_caption-image16h45.027.8

Note

  • All models are trained with ResNet50-C4 without multi-scale augmentation. All models use CLIP embeddings as the classifier.

  • We extract class names from COCO-captions as image-labels. Detic_CLIP_R50_1x_image uses the max-size loss; Detic_CLIP_R50_1x_caption directly uses CLIP caption embedding within each mini-batch for classification; Detic_CLIP_R50_1x_caption-image uses both losses.

  • We report box mAP50 under the "generalized" open-vocabulary setting.

Inference with LVIS Vocabulary

  • ID
  • Name
    detic-clip-r50-1x_caption-CPU
  • Model Type ID
    Visual Detector
  • Description
    --
  • Last Updated
    Aug 29, 2022
  • Privacy
    PUBLIC
  • License
  • Share
    • Badge
      general-image-detector-detic_clipR50Caption-coco