Vit

File size: 2,573 Bytes

5b30890
 
 
 
 
 
8ece166
 
 
5b30890
 
48a7142
6384ee1
 
61fac3e
 
5b30890
 
22af25c
79e4799
22af25c
79e4799
5b30890
 
 
48a7142
79e4799
48a7142
5b30890
 
 
48a7142
5b30890
 
 
 
8ece166
5b30890
 
 
 
8ece166
 
5b30890
 
 
8ece166
5b30890
8ece166
 
 
5b30890
 
 
8ece166
5b30890
8ece166
5b30890
 
8ece166
5b30890
8ece166
 
 
5b30890
8ece166
5b30890
8ece166
 
5b30890
8ece166
5b30890
8ece166
 
 
 
 
5b30890
 
8ece166
5b30890
8ece166
5b30890
8ece166
5b30890
8ece166
5b30890
8ece166
5b30890
8ece166
5b30890
8ece166
5b30890
 
 
8ece166
5b30890
8ece166
5b30890

---
license: unknown
language:
- en
metrics:
- accuracy
- precision
- f1
- recall
tags:
- art
base_model: google/vit-base-patch16-224
datasets:
- DataScienceProject/Art_Images_Ai_And_Real_
pipeline_tag: image-classification
library_name: transformers
---

### Model Card for Model ID
This model is designed for classifying images as either 'real' or 'fake-AI generated' using a Vision Transformer (VIT) .

Our goal is to accurately classify the source of the image with at least 85% accuracy and achieve at least 80% in the recall test.

### Model Description

This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images.
The model classifies images into two categories: 'real ' and 'fake - AI generated'.
It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs).

### Direct Use

This model can be used to classify images as 'real art' or 'fake art' based on visual features learned by the Vision Transformer.


### Out-of-Scope Use

The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset.


### Recommendations

Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab.


## How to Get Started with the Model

Prepare Data: Organize your images into appropriate folders and run the code.

## model architecture

![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/RhONF2ZsQi_aVqyyk17yK.png)

## Training Details

-Dataset: DataScienceProject/Art_Images_Ai_And_Real_

Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor  and stored in special torch dataset.


#### Training Hyperparameters

optimizer = optim.Adam(model.parameters(), lr=0.001)
num_epochs = 10
criterion = nn.CrossEntropyLoss()

## Evaluation

The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb  , gpu: rtx 3080 
your mileage may vary.

### Testing Data, Factors & Metrics

-precision 
-recall
-f1 
-confusion_matrix
-accuracy


### Results

-test accuracy = 0.92

-precision = 0.893

-recall = 0.957

-f1 = 0.924

-

![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/UYTV1X3AqFM50EFojMbn9.png)



#### Summary

This model is by far the best of what we tried (CNN , Resnet , CNN + ELA).