add readme
Browse files
README.md
CHANGED
@@ -1,3 +1,63 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
# Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection
|
6 |
+
|
7 |
+
[](https://arxiv.org/abs/2503.19683v1)
|
8 |
+
|
9 |
+
This repository contains the model for the paper:
|
10 |
+
|
11 |
+
**[Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection](https://arxiv.org/abs/2503.19683v1)**
|
12 |
+
|
13 |
+
## Abstract
|
14 |
+
|
15 |
+
> This paper tackles the challenge of detecting partially manipulated facial deepfakes, which involve subtle alterations to specific facial features while retaining the overall context, posing a greater detection difficulty than fully synthetic faces. We leverage the Contrastive Language-Image Pre-training (CLIP) model, specifically its ViT-L/14 visual encoder, to develop a generalizable detection method that performs robustly across diverse datasets and unknown forgery techniques with minimal modifications to the original model. The proposed approach utilizes parameter-efficient fine-tuning (PEFT) techniques, such as LN-tuning, to adjust a small subset of the model's parameters, preserving CLIP's pre-trained knowledge and reducing overfitting. A tailored preprocessing pipeline optimizes the method for facial images, while regularization strategies, including L2 normalization and metric learning on a hyperspherical manifold, enhance generalization. Trained on the FaceForensics++ dataset and evaluated in a cross-dataset fashion on Celeb-DF-v2, DFDC, FFIW, and others, the proposed method achieves competitive detection accuracy comparable or outperforming much more complex state-of-the-art techniques. This work highlights the efficacy of CLIP's visual encoder in facial deepfake detection and establishes a simple, powerful baseline for future research, advancing the field of generalizable deepfake detection.
|
16 |
+
|
17 |
+
## Results
|
18 |
+
|
19 |
+
Generalization of models trained on the FF++ dataset to unseen datasets and forgery methods. Reported values are **video-level AUROC**. Results of other methods are taken from their original papers. Values with * are taken from the other papers.
|
20 |
+
|
21 |
+
| Model | Year | Publication | CDFv2 | DFD | DFDC | FFIW | DSv1 |
|
22 |
+
|------------------------|------|-------------|-------|-------|-------|-------|-------|
|
23 |
+
| LipForensics | 2021 | CVPR | 82.4 | -- | 73.5 | -- | -- |
|
24 |
+
| FTCN | 2021 | ICCV | 86.9 | -- | 74.0 | 74.47* | -- |
|
25 |
+
| RealForensics | 2022 | CVPR | 86.9 | -- | 75.9 | -- | -- |
|
26 |
+
| SBI | 2022 | CVPR | 93.18 | 82.68 | 72.42 | 84.83 | -- |
|
27 |
+
| AUNet | 2023 | CVPR | 92.77 | 99.22 | 73.82 | 81.45 | -- |
|
28 |
+
| StyleDFD | 2024 | CVPR | 89.0 | 96.1 | -- | -- | -- |
|
29 |
+
| LSDA | 2024 | CVPR | 91.1 | -- | 77.0 | 72.4* | -- |
|
30 |
+
| LAA-Net | 2024 | CVPR | 95.4 | 98.43 | 86.94 | -- | -- |
|
31 |
+
| AltFreezing | 2024 | CVPR | 89.5 | 98.5 | 99.4 | -- | -- |
|
32 |
+
| NACO | 2024 | ECCV | 89.5 | -- | 76.7 | -- | -- |
|
33 |
+
| TALL++ | 2024 | IJCV | 91.96 | -- | 78.51 | -- | -- |
|
34 |
+
| UDD | 2025 | arXiv | 93.13 | 95.51 | 81.21 | -- | -- |
|
35 |
+
| Effort | 2025 | arXiv | 95.6 | 96.5 | 84.3 | 92.1 | -- |
|
36 |
+
| KID | 2025 | arXiv | 95.74 | 99.46 | 75.77 | 82.53 | -- |
|
37 |
+
| ForensicsAdapter | 2025 | arXiv | 95.7 | 97.2 | 87.2 | -- | -- |
|
38 |
+
| **Proposed** | 2025 | arXiv | 96.62 | 98.0 | 87.15 | 91.52 | 92.01 |
|
39 |
+
|
40 |
+
## Example
|
41 |
+
|
42 |
+
Find the code in our [github](https://github.com/yermandy/deepfake-detection) project. Read `inference.py`, it automatically downloads the model from [huggingface](https://huggingface.co/yermandy/deepfake-detection/tree/main) to and runs inference on sample images. Make sure to have the required dependencies installed before running the script.
|
43 |
+
|
44 |
+
``` bash
|
45 |
+
python inference.py
|
46 |
+
```
|
47 |
+
|
48 |
+
**❗ Important note**: sample images are already preprocessed. To get the same results as in the paper, you need to preprocess images using DeepfakeBench [preprocessing](https://github.com/SCLBD/DeepfakeBench/blob/fb6171a8e1db2ae0f017d9f3a12be31fd9e0a3fb/preprocessing/preprocess.py) pipeline.
|
49 |
+
|
50 |
+
|
51 |
+
## Cite
|
52 |
+
|
53 |
+
``` bibtex
|
54 |
+
@article{yermakov-2025-deepfake-detection,
|
55 |
+
title={Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection},
|
56 |
+
author={Andrii Yermakov and Jan Cech and Jiri Matas},
|
57 |
+
year={2025},
|
58 |
+
eprint={2503.19683},
|
59 |
+
archivePrefix={arXiv},
|
60 |
+
primaryClass={cs.CV},
|
61 |
+
url={https://arxiv.org/abs/2503.19683},
|
62 |
+
}
|
63 |
+
```
|