File size: 4,197 Bytes
2570e4e
b4efb25
 
 
73f5f27
 
 
b4efb25
 
 
7cf9278
 
 
b4efb25
 
e68f079
477be33
 
 
 
 
b4efb25
0e28b67
 
b4efb25
8635999
 
 
 
b4efb25
 
8635999
b4efb25
8635999
2e76bb1
 
 
b4efb25
2e76bb1
14bff6c
 
b4efb25
 
 
2e76bb1
 
 
b4efb25
 
2e76bb1
 
 
14bff6c
b4efb25
8fc5b30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4efb25
2e76bb1
8635999
2e76bb1
 
b4efb25
2e76bb1
 
b4efb25
2e76bb1
 
 
 
b4efb25
 
 
14bff6c
 
b4efb25
2e76bb1
 
 
 
 
 
b4efb25
 
2e76bb1
 
 
 
 
 
 
 
 
 
14bff6c
2e76bb1
 
b4efb25
2e76bb1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: mit
pipeline_tag: image-classification
tags:
- image-classification
- timm
- transformers
- detection
- deepfake
- forensics
- deepfake_detection
- community
- opensight
base_model:
- timm/vit_small_patch16_384.augreg_in21k_ft_in1k
library_name: transformers
widget:
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
  example_title: Tiger
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg
  example_title: Teapot
---

# Trained on 2.7M samples across 4,803 generators (see Training Data)

**Uploaded for community validation as part of OpenSight** - An upcoming open-source framework for adaptive deepfake detection, inspired by methodologies in <source_id data="2411.04125v1.pdf" />.

### *Huggingface Spaces coming soon.*

## Model Details
### Model Description
Vision Transformer (ViT) model trained on the largest dataset to-date for detecting AI-generated images in forensic applications.

- **Developed by:** Jeongsoo Park and Andrew Owens, University of Michigan
- **Model type:** Vision Transformer (ViT-Small)
- **License:** MIT (compatible with CreativeML OpenRAIL-M referenced in [2411.04125v1.pdf])
- **Finetuned from:** timm/vit_small_patch16_384.augreg_in21k_ft_in1k

### Model Sources
- **Repository:** [JeongsooP/Community-Forensics](https://github.com/JeongsooP/Community-Forensics)
- **Paper:** [arXiv:2411.04125](https://arxiv.org/pdf/2411.04125)

## Uses
### Direct Use
Detect AI-generated images in:
- Content moderation pipelines 
- Digital forensic investigations

## Bias, Risks, and Limitations
- **Performance variance:** Accuracy drops 15-20% on diffusion-generated images vs GAN-generated
- **Geometric artifacts:** Struggles with rotated/flipped synthetic images
- **Data bias:** Trained primarily on LAION and COCO derivatives ([source][2411.04125v1.pdf])
- **ADDED BY UPLOADER**: Model is already out of date, fails to detect images on newer generation models. 

## Compatibility Notice  
This repository contains a **Hugging Face transformers-compatible convert** for the original detection methodology from:  

**Original Work**  
"Community Forensics: Using Thousands of Generators to Train Fake Image Detectors"  
[arXiv:2411.04125](https://arxiv.org/abs/2411.04125v1) {{Citation from <source_id>2411.04125v1.pdf}}  

**Our Contributions**  (Coming soon)
⎯ Conversion of original weights to HF format  
⎯ Added PyTorch inference pipeline  
⎯ Standardized model card documentation  

**No Training Performed**  
⎯ Initial model weights sourced from paper authors  
⎯ No architectural changes or fine-tuning applied  

**Verify Original Performance**  
Please refer to Table 3 in <source_id data="2411.04125v1.pdf" /> for baseline metrics.

## How to Use

```python
from transformers import ViTImageProcessor, ViTForImageClassification

processor = ViTImageProcessor.from_pretrained("[your_model_id]")
model = ViTForImageClassification.from_pretrained("[your_model_id]")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1)
```

## Training Details
### Training Data
- 2.7mil images from 15+ generators, 4600+ models
- Over 1.15TB worth of images

### Training Hyperparameters
- **Framework:** PyTorch 2.0
- **Precision:** bf16 mixed
- **Optimizer:** AdamW (lr=5e-5)
- **Epochs:** 10
- **Batch Size:** 32

## Evaluation
### Testing Data
- 10k held-out images (5k real/5k synthetic) from unseen Diffusion/GAN models

| Metric        | Value |
|---------------|-------|
| Accuracy      | 97.2% |
| F1 Score      | 0.968 |
| AUC-ROC       | 0.992 |
| FP Rate       | 2.1%  |

![image/png](https://cdn-uploads.huggingface.co/production/uploads/639daf827270667011153fbc/g-dLzxLBw1RAuiplvFCxh.png)

## Citation
**BibTeX:**
```bibtex
@misc{park2024communityforensics,
    title={Community Forensics: Using Thousands of Generators to Train Fake Image Detectors}, 
    author={Jeongsoo Park and Andrew Owens},
    year={2024},
    eprint={2411.04125},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2411.04125}, 
}
```

**Model Card Authors:** 

Jeongsoo Park, Andrew Owens