DataScienceProject
/

Vit

Image Classification

Model card Files Files and versions Community

Vit / README.md

benjaminStreltzin's picture

benjaminStreltzin

Update README.md

79e4799 verified 10 months ago

|

history blame contribute delete

2.57 kB

	---
	license: unknown
	language:
	- en
	metrics:
	- accuracy
	- precision
	- f1
	- recall
	tags:
	- art
	base_model: google/vit-base-patch16-224
	datasets:
	- DataScienceProject/Art_Images_Ai_And_Real_
	pipeline_tag: image-classification
	library_name: transformers
	---

	### Model Card for Model ID
	This model is designed for classifying images as either 'real' or 'fake-AI generated' using a Vision Transformer (VIT) .

	Our goal is to accurately classify the source of the image with at least 85% accuracy and achieve at least 80% in the recall test.

	### Model Description

	This model leverages the Vision Transformer (ViT) architecture, which applies self-attention mechanisms to process images.
	The model classifies images into two categories: 'real ' and 'fake - AI generated'.
	It captures intricate patterns and features that help in distinguishing between the two categories without the need for Convolutional Neural Networks (CNNs).

	### Direct Use

	This model can be used to classify images as 'real art' or 'fake art' based on visual features learned by the Vision Transformer.


	### Out-of-Scope Use

	The model may not perform well on images outside the scope of art or where the visual characteristics are drastically different from those in the training dataset.


	### Recommendations

	Run the traning code on pc with an nvidia gpu better then rtx 3060 and at least 6 core cpu / use google collab.


	## How to Get Started with the Model

	Prepare Data: Organize your images into appropriate folders and run the code.

	## model architecture

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/RhONF2ZsQi_aVqyyk17yK.png)

	## Training Details

	-Dataset: DataScienceProject/Art_Images_Ai_And_Real_

	Preprocessing: Images are resized, converted to 'rgb' format , transformed into tensor and stored in special torch dataset.


	#### Training Hyperparameters

	optimizer = optim.Adam(model.parameters(), lr=0.001)
	num_epochs = 10
	criterion = nn.CrossEntropyLoss()

	## Evaluation

	The model takes 15-20 minutes to run , based on our dataset , equipped with the following pc hardware: cpu :i9 13900 ,ram: 32gb , gpu: rtx 3080
	your mileage may vary.

	### Testing Data, Factors & Metrics

	-precision
	-recall
	-f1
	-confusion_matrix
	-accuracy


	### Results

	-test accuracy = 0.92

	-precision = 0.893

	-recall = 0.957

	-f1 = 0.924

	-

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66d6f1b3b50e35e1709bfdf7/UYTV1X3AqFM50EFojMbn9.png)



	#### Summary

	This model is by far the best of what we tried (CNN , Resnet , CNN + ELA).