ViT-Classification-CIFAR10

Model Description

This model is a Vision Transformer (ViT) architecture trained on the CIFAR-10 dataset for image classification. It is trained from scratch without pre-training on a larger dataset.

Metrics:

  • Test accuracy: 82.04%
  • Test loss: 0.5560

Training Configuration

Hardware: NVIDIA RTX 3090

Training parameters:

  • Epochs: 200
  • Batch size: 2048
  • Input size: 3x32x32
  • Patch size: 4
  • Sequence length: 8*8
  • Embed size: 128
  • Num of layers: 12
  • Num of heads: 4
  • Forward multiplier: 2
  • Dropout: 0.1
  • Optimizer: AdamW

Intended Uses & Limitations

This model is intended for practice purposes and exploration of ViT architectures on the CIFAR-10 dataset. It can be used for image classification tasks on similar datasets.

Limitations:

  • This model is trained on a relatively small dataset (CIFAR-10) and might not generalize well to unseen data.
  • Training is done without fine-tuning, potentially limiting its performance compared to a fine-tuned model.
  • Training is performed on a single RTX 3090.

Training Data

The model is trained on the CIFAR-10 dataset, containing 60,000 32x32 color images in 10 classes.

  • Training set: 50,000 images
  • Test set: 10,000 images

Data Source: https://paperswithcode.com/dataset/cifar-10

Documentation

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.