Image Classification
Transformers
English
art
Vit / README.md
benjaminStreltzin's picture
Update README.md
79e4799 verified
---
license: unknown
language:
- en
metrics:
- accuracy
- precision
- f1
- recall
tags:
- art
base_model: google/vit-base-patch16-224
datasets:
- DataScienceProject/Art_Images_Ai_And_Real_
pipeline_tag: image-classification
library_name: transformers
---
### Model Card for Model ID
This model is designed for classifying images as either 'real' or 'fake-AI generated' using a Vision Transformer (VIT) .
Our goal is to accurately classify the source of the image with at least 85% accuracy and achieve at least 80% in the recall test.
### Model Description
This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images.
The model classifies images into two categories: 'real ' and 'fake - AI generated'.
It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs).
### Direct Use
This model can be used to classify images as 'real art' or 'fake art' based on visual features learned by the Vision Transformer.
### Out-of-Scope Use
The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset.
### Recommendations
Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab.
## How to Get Started with the Model
Prepare Data: Organize your images into appropriate folders and run the code.
## model architecture
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/RhONF2ZsQi_aVqyyk17yK.png)
## Training Details
-Dataset: DataScienceProject/Art_Images_Ai_And_Real_
Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor and stored in special torch dataset.
#### Training Hyperparameters
optimizer = optim.Adam(model.parameters(), lr=0.001)
num_epochs = 10
criterion = nn.CrossEntropyLoss()
## Evaluation
The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb , gpu: rtx 3080
your mileage may vary.
### Testing Data, Factors & Metrics
-precision
-recall
-f1
-confusion_matrix
-accuracy
### Results
-test accuracy = 0.92
-precision = 0.893
-recall = 0.957
-f1 = 0.924
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/UYTV1X3AqFM50EFojMbn9.png)
#### Summary
This model is by far the best of what we tried (CNN , Resnet , CNN + ELA).