- Community
- Model
- general-image-detector-detic_C2_SwinB-21K_COCO
Notes
Detecting Twenty-thousand Classes using Image-level Supervision
Detic: A Detector with image classes that can use image-level labels to easily train detectors.
Detecting Twenty-thousand Classes using Image-level Supervision,
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
ECCV 2022 (arXiv 2201.02605)
Features
Detects any class given class names (using CLIP).
We train the detector on ImageNet-21K dataset with 21K classes.
Cross-dataset generalization to OpenImages and Objects365 without finetuning.
State-of-the-art results on Open-vocabulary LVIS and Open-vocabulary COCO.
Detic_C2_SwinB_896_4x_IN-21K+COCO Performance
Name | Training time | Objects365 box mAP | OpenImages box mAP50 |
---|---|---|---|
Box-Supervised_C2_SwinB_896_4x | 43h | 19.1 | 46.2 |
Detic_C2_SwinB_896_4x | 47h | 21.2 | 53.0 |
Detic_C2_SwinB_896_4x_IN-21K | 47h | 21.4 | 55.2 |
Box-Supervised_C2_SwinB_896_4x+COCO | 43h | 19.7 | 46.4 |
Detic_C2_SwinB_896_4x_IN-21K+COCO | 47h | 21.6 | 54.6 |
Note
- Box-Supervised_C2_SwinB_896_4x and Detic_C2_SwinB_896_4x are the same model in the Standard LVIS section, but evaluated with Objects365/ OpenImages vocabulary (i.e. CLIP embeddings of the corresponding class names as classifier).
Detic_C2_SwinB_896_4x_IN-21K trains on the full ImageNet-22K. We additionally use a dynamic class sampling ("Modified Federated Loss" in Section 4.4) and use a larger data sampling ratio of ImageNet images (1:16 instead of 1:4).
Detic_C2_SwinB_896_4x_IN-21K-COCO is a model trained on combined LVIS-COCO and ImageNet-21K for better demo purposes. LVIS models do not detect persons well due to its federated annotation protocol. LVIS+COCO models give better visual results.
- ID
- Namegeneral-image-detector-detic_C2_SwinB-21K_COCO
- Model Type IDVisual Detector
- Description--
- Last UpdatedAug 29, 2022
- PrivacyPUBLIC
- License
- Share
- Badge