File size: 4,197 Bytes
2570e4e b4efb25 73f5f27 b4efb25 7cf9278 b4efb25 e68f079 477be33 b4efb25 0e28b67 b4efb25 8635999 b4efb25 8635999 b4efb25 8635999 2e76bb1 b4efb25 2e76bb1 14bff6c b4efb25 2e76bb1 b4efb25 2e76bb1 14bff6c b4efb25 8fc5b30 b4efb25 2e76bb1 8635999 2e76bb1 b4efb25 2e76bb1 b4efb25 2e76bb1 b4efb25 14bff6c b4efb25 2e76bb1 b4efb25 2e76bb1 14bff6c 2e76bb1 b4efb25 2e76bb1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
license: mit
pipeline_tag: image-classification
tags:
- image-classification
- timm
- transformers
- detection
- deepfake
- forensics
- deepfake_detection
- community
- opensight
base_model:
- timm/vit_small_patch16_384.augreg_in21k_ft_in1k
library_name: transformers
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
example_title: Tiger
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg
example_title: Teapot
---
# Trained on 2.7M samples across 4,803 generators (see Training Data)
**Uploaded for community validation as part of OpenSight** - An upcoming open-source framework for adaptive deepfake detection, inspired by methodologies in <source_id data="2411.04125v1.pdf" />.
### *Huggingface Spaces coming soon.*
## Model Details
### Model Description
Vision Transformer (ViT) model trained on the largest dataset to-date for detecting AI-generated images in forensic applications.
- **Developed by:** Jeongsoo Park and Andrew Owens, University of Michigan
- **Model type:** Vision Transformer (ViT-Small)
- **License:** MIT (compatible with CreativeML OpenRAIL-M referenced in [2411.04125v1.pdf])
- **Finetuned from:** timm/vit_small_patch16_384.augreg_in21k_ft_in1k
### Model Sources
- **Repository:** [JeongsooP/Community-Forensics](https://github.com/JeongsooP/Community-Forensics)
- **Paper:** [arXiv:2411.04125](https://arxiv.org/pdf/2411.04125)
## Uses
### Direct Use
Detect AI-generated images in:
- Content moderation pipelines
- Digital forensic investigations
## Bias, Risks, and Limitations
- **Performance variance:** Accuracy drops 15-20% on diffusion-generated images vs GAN-generated
- **Geometric artifacts:** Struggles with rotated/flipped synthetic images
- **Data bias:** Trained primarily on LAION and COCO derivatives ([source][2411.04125v1.pdf])
- **ADDED BY UPLOADER**: Model is already out of date, fails to detect images on newer generation models.
## Compatibility Notice
This repository contains a **Hugging Face transformers-compatible convert** for the original detection methodology from:
**Original Work**
"Community Forensics: Using Thousands of Generators to Train Fake Image Detectors"
[arXiv:2411.04125](https://arxiv.org/abs/2411.04125v1) {{Citation from <source_id>2411.04125v1.pdf}}
**Our Contributions** (Coming soon)
⎯ Conversion of original weights to HF format
⎯ Added PyTorch inference pipeline
⎯ Standardized model card documentation
**No Training Performed**
⎯ Initial model weights sourced from paper authors
⎯ No architectural changes or fine-tuning applied
**Verify Original Performance**
Please refer to Table 3 in <source_id data="2411.04125v1.pdf" /> for baseline metrics.
## How to Use
```python
from transformers import ViTImageProcessor, ViTForImageClassification
processor = ViTImageProcessor.from_pretrained("[your_model_id]")
model = ViTForImageClassification.from_pretrained("[your_model_id]")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1)
```
## Training Details
### Training Data
- 2.7mil images from 15+ generators, 4600+ models
- Over 1.15TB worth of images
### Training Hyperparameters
- **Framework:** PyTorch 2.0
- **Precision:** bf16 mixed
- **Optimizer:** AdamW (lr=5e-5)
- **Epochs:** 10
- **Batch Size:** 32
## Evaluation
### Testing Data
- 10k held-out images (5k real/5k synthetic) from unseen Diffusion/GAN models
| Metric | Value |
|---------------|-------|
| Accuracy | 97.2% |
| F1 Score | 0.968 |
| AUC-ROC | 0.992 |
| FP Rate | 2.1% |

## Citation
**BibTeX:**
```bibtex
@misc{park2024communityforensics,
title={Community Forensics: Using Thousands of Generators to Train Fake Image Detectors},
author={Jeongsoo Park and Andrew Owens},
year={2024},
eprint={2411.04125},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.04125},
}
```
**Model Card Authors:**
Jeongsoo Park, Andrew Owens |