DataScienceProject
/

Vit

Image Classification

Transformers

English

art

Model card Files Files and versions Community

benjaminStreltzin commited on Sep 11, 2024

Commit

8ece166

verified ·

1 Parent(s): a735f89

Update README.md

Browse files

Files changed (1) hide show

README.md +35 -25

README.md CHANGED Viewed

@@ -4,6 +4,9 @@ language:
 - en
 metrics:
 - accuracy
 tags:
 - art
 base_model: google/vit-base-patch16-224
@@ -12,15 +15,12 @@ datasets:
 pipeline_tag: image-classification
 library_name: transformers
 ---
-# Model Card for Model ID
-## Model Details
 ### Model Description
 This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images.
-The model classifies images into two categories: 'real art' and 'fake art'.
 It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs).
 ### Direct Use
@@ -30,56 +30,66 @@ This model can be used to classify images as 'real art' or 'fake art' based on v
 ### Out-of-Scope Use
-The model may not perform optimally on images outside the art domain or on artworks
-with significantly different visual characteristics compared to those in the training dataset.
-It is not suitable for medical imaging or other non-artistic visual tasks.
-## Bias, Risks, and Limitations
-Users should be mindful of the model's limitations and potential biases, particularly regarding artworks that differ significantly from the training data.
-Regular updates and evaluations may be necessary to ensure the model remains accurate and effective.
 ### Recommendations
 ## How to Get Started with the Model
-Prepare Data: Organize your images into appropriate folders, ensuring they are resized and normalized.
-Train the Model: Utilize the provided code to train the Vision Transformer model on your dataset.
-Evaluate: Assess the model's performance on a separate test set of images.
 ## Training Details
-### Training Data
-Dataset: [Link to dataset or description]
-Preprocessing: Images are resized, normalized, and prepared for input to the Vision Transformer.
-### Training Procedure
-Images are resized to a uniform dimension and normalized. The Vision Transformer model is then trained on these preprocessed images.
-#### Training Hyperparameters
-## Evaluation
-### Testing Data, Factors & Metrics
-#### Testing Data
-### Results
-#### Summary

 - en
 metrics:
 - accuracy
+- precision
+- f1
+- recall
 tags:
 - art
 base_model: google/vit-base-patch16-224
 pipeline_tag: image-classification
 library_name: transformers
 ---
 ### Model Description
 This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images.
+The model classifies images into two categories: 'real ' and 'fake - ai generated'.
 It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs).
 ### Direct Use
 ### Out-of-Scope Use
+The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset.
 ### Recommendations
+Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab.
 ## How to Get Started with the Model
+Prepare Data: Organize your images into appropriate folders and run the code.
+## model architecture
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/RhONF2ZsQi_aVqyyk17yK.png)
 ## Training Details
+-Dataset: DataScienceProject/Art_Images_Ai_And_Real_
+Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor  and stored in special torch dataset.
+#### Training Hyperparameters
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+num_epochs = 10
+criterion = nn.CrossEntropyLoss()
+## Evaluation
+The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb  , gpu: rtx 3080
+your mileage may vary.
+### Testing Data, Factors & Metrics
+-precision
+-recall
+-f1
+-confusion_matrix
+-accuracy
+### Results
+-test accuracy = 0.92
+-precision = 0.893
+-recall = 0.957
+-f1 = 0.924
+-
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/UYTV1X3AqFM50EFojMbn9.png)
+#### Summary
+This model is by far the best of what we tried (CNN , Resnet , CNN + ELA).