Model Details

This model is a variant of the ViT architecture, specifically based on the 'vit_base_patch16_224' configuration fine-tuned for satellite image classification tasks using the EuroSAT dataset.

Model type: Vision Transformer (ViT)

Finetuned from model : "timm/vit_base_patch16_224.augreg2_in21k_ft_in1k"

Model Sources

Repository: https://github.com/chathumal93/EuroSat-RGB-Classifiers

Training Details

Training Data

The dataset comprises JPEG composite chips extracted from Sentinel-2 satellite imagery, representing the Red, Green, and Blue bands. It encompasses 27,000 labeled and geo-referenced images across 10 Land Use and Land Cover (LULC) classes

Training Procedure

Preprocessing: Standard image preprocessing including resizing, center cropping, normalization, and data augmentation techniques [RandomHorizontalFlip and RandomVerticalFlip]

Training Hyperparameters

  • Learning rate: 3e-5
  • Batch size: 64
  • Optimizer: AdamW
  • Scheduler: PolynomialLR
  • Loss: CrossEntropyLoss
  • Betas=(0.9, 0.999)
  • Weight_decay=0.01
  • Epochs: 20

Evaluation

Results

Results on test dataset at 8th epoch.

Model Phase Avg Loss Accuracy
vit-base-patch16-224-eurosat Train 0.012038 99.61%
Validation 0.023757 99.04%
Test 0.040557 98.67%
Model Accuracy Precision Recall F1
vit-base-patch16-224-eurosat 98.67% 0.98673 0.98667 0.98668
Downloads last month
32
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train cm93/vit-base-patch16-224-eurosat