|
--- |
|
license: unknown |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- f1 |
|
- recall |
|
tags: |
|
- art |
|
base_model: google/vit-base-patch16-224 |
|
datasets: |
|
- DataScienceProject/Art_Images_Ai_And_Real_ |
|
pipeline_tag: image-classification |
|
library_name: transformers |
|
--- |
|
|
|
### Model Card for Model ID |
|
This model is designed for classifying images as either 'real' or 'fake-AI generated' using a Vision Transformer (VIT) . |
|
|
|
Our goal is to accurately classify the source of the image with at least 85% accuracy and achieve at least 80% in the recall test. |
|
|
|
### Model Description |
|
|
|
This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images. |
|
The model classifies images into two categories: 'real ' and 'fake - AI generated'. |
|
It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs). |
|
|
|
### Direct Use |
|
|
|
This model can be used to classify images as 'real art' or 'fake art' based on visual features learned by the Vision Transformer. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset. |
|
|
|
|
|
### Recommendations |
|
|
|
Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Prepare Data: Organize your images into appropriate folders and run the code. |
|
|
|
## model architecture |
|
|
|
 |
|
|
|
## Training Details |
|
|
|
-Dataset: DataScienceProject/Art_Images_Ai_And_Real_ |
|
|
|
Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor and stored in special torch dataset. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
optimizer = optim.Adam(model.parameters(), lr=0.001) |
|
num_epochs = 10 |
|
criterion = nn.CrossEntropyLoss() |
|
|
|
## Evaluation |
|
|
|
The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb , gpu: rtx 3080 |
|
your mileage may vary. |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
-precision |
|
-recall |
|
-f1 |
|
-confusion_matrix |
|
-accuracy |
|
|
|
|
|
### Results |
|
|
|
-test accuracy = 0.92 |
|
|
|
-precision = 0.893 |
|
|
|
-recall = 0.957 |
|
|
|
-f1 = 0.924 |
|
|
|
- |
|
|
|
 |
|
|
|
|
|
|
|
#### Summary |
|
|
|
This model is by far the best of what we tried (CNN , Resnet , CNN + ELA). |
|
|
|
|