DEIT (Data-Efficient Image Transformers) is a state-of-the-art machine learning model for image classification tasks. It is a transformer-based model that achieves competitive results on ImageNet with no external data. DEIT is designed to be data-efficient, meaning it can achieve high accuracy with relatively small amounts of training data. DEIT is based on a type of artificial intelligence called transformers, which are similar to the ones used in natural language processing. DEIT uses a teacher-student strategy for training, which leads to results competitive with convnets for both ImageNet and when transferring to other tasks. With its relatively lightweight architecture, DEIT is a powerful tool for tasks like object recognition, scene recognition, and image retrieval.
A Data-Efficient Image Transformer is a type of Vision Transformer for image classification tasks. The model is trained using a teacher-student strategy specific to transformers, which leads to results competitive with convnets for both ImageNet and when transferring to other tasks. It relies on a distillation token ensuring that the student learns from the teacher through attention. DEIT-Base uses a linear classifier for pre-training and does not use an MLP head. The model has 86M parameters, which makes it a relatively lightweight model compared to other state-of-the-art models.
- - DEIT can be used for image classification tasks, such as object recognition, scene recognition, and image retrieval.
- - DEIT can be used in resource-constrained environments where computational resources are limited.
DEIT-Base is trained on the ImageNet dataset, which contains over 1 million images with 1000 classes. The dataset is split into training, validation, and test sets. DEIT-Base is also evaluated on ImageNet V2 and ImageNet Real, which have a test set distinct from the ImageNet validation set.
DEIT-Base achieves top-1 accuracy of 83.1% on ImageNet with no external data. It outperforms previous ViT models trained on ImageNet1k only by 6.3%. DEIT-Base also outperforms the Vit-B model pre-trained on JFT300M at resolution 384 by 1% (top-1 acc.). DEIT-Base outperforms the state of the art on the trade-off between accuracy and inference time on GPU.
- - DEIT-Base is limited by the size of the ImageNet dataset and may not generalize well to other datasets.
- - DEIT-Base may be limited by the complexity of the image classification task and may not perform well on more complex tasks, such as object detection and semantic segmentation.
- - As with other deep learning models, DEIT-Base may lack interpretability, making it difficult to understand how the model makes its predictions.
- Model Type IDVisual Classifier
- DescriptionA Data-Efficient Image Transformer (DEIT) is a state-of-the-art image classification model pre-trained and fine-tuned on ImageNet-1k (1 million images, 1,000 classes)
- Last UpdatedJul 26, 2023