Image Classification
Transformers
English
art
benjaminStreltzin commited on
Commit
8ece166
·
verified ·
1 Parent(s): a735f89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -25
README.md CHANGED
@@ -4,6 +4,9 @@ language:
4
  - en
5
  metrics:
6
  - accuracy
 
 
 
7
  tags:
8
  - art
9
  base_model: google/vit-base-patch16-224
@@ -12,15 +15,12 @@ datasets:
12
  pipeline_tag: image-classification
13
  library_name: transformers
14
  ---
15
- # Model Card for Model ID
16
 
17
 
18
- ## Model Details
19
-
20
  ### Model Description
21
 
22
  This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images.
23
- The model classifies images into two categories: 'real art' and 'fake art'.
24
  It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs).
25
 
26
  ### Direct Use
@@ -30,56 +30,66 @@ This model can be used to classify images as 'real art' or 'fake art' based on v
30
 
31
  ### Out-of-Scope Use
32
 
33
- The model may not perform optimally on images outside the art domain or on artworks
34
- with significantly different visual characteristics compared to those in the training dataset.
35
- It is not suitable for medical imaging or other non-artistic visual tasks.
36
-
37
- ## Bias, Risks, and Limitations
38
 
39
- Users should be mindful of the model's limitations and potential biases, particularly regarding artworks that differ significantly from the training data.
40
- Regular updates and evaluations may be necessary to ensure the model remains accurate and effective.
41
 
42
  ### Recommendations
43
 
 
 
44
 
45
  ## How to Get Started with the Model
46
 
47
- Prepare Data: Organize your images into appropriate folders, ensuring they are resized and normalized.
48
- Train the Model: Utilize the provided code to train the Vision Transformer model on your dataset.
49
- Evaluate: Assess the model's performance on a separate test set of images.
50
 
 
 
 
51
 
52
  ## Training Details
53
 
54
- ### Training Data
55
 
56
- Dataset: [Link to dataset or description]
57
- Preprocessing: Images are resized, normalized, and prepared for input to the Vision Transformer.
58
 
59
- ### Training Procedure
60
 
61
- Images are resized to a uniform dimension and normalized. The Vision Transformer model is then trained on these preprocessed images.
62
 
 
 
 
63
 
64
- #### Training Hyperparameters
65
 
 
 
66
 
 
67
 
68
- ## Evaluation
 
 
 
 
69
 
70
 
 
71
 
72
- ### Testing Data, Factors & Metrics
73
 
 
74
 
75
- #### Testing Data
76
 
 
77
 
 
78
 
 
79
 
80
 
81
- ### Results
82
 
 
83
 
 
84
 
85
- #### Summary
 
4
  - en
5
  metrics:
6
  - accuracy
7
+ - precision
8
+ - f1
9
+ - recall
10
  tags:
11
  - art
12
  base_model: google/vit-base-patch16-224
 
15
  pipeline_tag: image-classification
16
  library_name: transformers
17
  ---
 
18
 
19
 
 
 
20
  ### Model Description
21
 
22
  This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images.
23
+ The model classifies images into two categories: 'real ' and 'fake - ai generated'.
24
  It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs).
25
 
26
  ### Direct Use
 
30
 
31
  ### Out-of-Scope Use
32
 
33
+ The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset.
 
 
 
 
34
 
 
 
35
 
36
  ### Recommendations
37
 
38
+ Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab.
39
+
40
 
41
  ## How to Get Started with the Model
42
 
43
+ Prepare Data: Organize your images into appropriate folders and run the code.
 
 
44
 
45
+ ## model architecture
46
+
47
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/RhONF2ZsQi_aVqyyk17yK.png)
48
 
49
  ## Training Details
50
 
51
+ -Dataset: DataScienceProject/Art_Images_Ai_And_Real_
52
 
53
+ Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor and stored in special torch dataset.
 
54
 
 
55
 
56
+ #### Training Hyperparameters
57
 
58
+ optimizer = optim.Adam(model.parameters(), lr=0.001)
59
+ num_epochs = 10
60
+ criterion = nn.CrossEntropyLoss()
61
 
62
+ ## Evaluation
63
 
64
+ The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb , gpu: rtx 3080
65
+ your mileage may vary.
66
 
67
+ ### Testing Data, Factors & Metrics
68
 
69
+ -precision
70
+ -recall
71
+ -f1
72
+ -confusion_matrix
73
+ -accuracy
74
 
75
 
76
+ ### Results
77
 
78
+ -test accuracy = 0.92
79
 
80
+ -precision = 0.893
81
 
82
+ -recall = 0.957
83
 
84
+ -f1 = 0.924
85
 
86
+ -
87
 
88
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/UYTV1X3AqFM50EFojMbn9.png)
89
 
90
 
 
91
 
92
+ #### Summary
93
 
94
+ This model is by far the best of what we tried (CNN , Resnet , CNN + ELA).
95