- Community
- Model
- general-image-detector-detic_C2_IN_L_SwinB_lvis
Notes
Detecting Twenty-thousand Classes using Image-level Supervision
Detic: A Detector with image classes that can use image-level labels to easily train detectors.
Detecting Twenty-thousand Classes using Image-level Supervision,
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
ECCV 2022 (arXiv 2201.02605)
Features
Detects any class given class names (using CLIP).
We train the detector on ImageNet-21K dataset with 21K classes.
Cross-dataset generalization to OpenImages and Objects365 without finetuning.
State-of-the-art results on Open-vocabulary LVIS and Open-vocabulary COCO.
Detic_C2_IN-L_SwinB_896_4x Performance
Open-vocabulary LVIS
Name | Training time | mask mAP | mask mAP_novel |
---|---|---|---|
Box-Supervised_C2_R50_640_4x | 17h | 30.2 | 16.4 |
Detic_C2_IN-L_R50_640_4x | 22h | 32.4 | 24.9 |
Detic_C2_CCimg_R50_640_4x | 22h | 31.0 | 19.8 |
Detic_C2_CCcapimg_R50_640_4x | 22h | 31.0 | 21.3 |
Box-Supervised_C2_SwinB_896_4x | 43h | 38.4 | 21.9 |
Detic_C2_IN-L_SwinB_896_4x | 47h | 40.7 | 33.8 |
Note
The open-vocabulary LVIS setup is LVIS without rare class annotations in training. We evaluate rare classes as novel classes in testing.
The models with C2 are trained using our improved LVIS baseline (Appendix D of the paper), including CenterNet2 detector, Federated Loss, large-scale jittering, etc.
All models use CLIP embeddings as classifiers. This makes the box-supervised models have non-zero mAP on novel classes.
The models with IN-L use the overlap classes between ImageNet-21K and LVIS as image-labeled data.
The models with CC use Conception Captions. CCimg uses image labels extracted from the captions (using a naive text-match) as image-labeled data. CCcapimg additionally uses the row captions (Appendix C of the paper).
The Detic models are finetuned on the corresponding Box-Supervised models above (indicated by MODEL.WEIGHTS in the config files). Please train or download the Box-Supervised model and place them under DETIC_ROOT/models/ before training the Detic models.
- ID
- Namegeneral-image-detector-detic_C2_IN_L_SwinB_lvis
- Model Type IDVisual Detector
- Description--
- Last UpdatedAug 29, 2022
- PrivacyPUBLIC
- License
- Share
- Badge