|
--- |
|
base_model: |
|
- facebook/dinov2-base |
|
library_name: transformers |
|
license: mit |
|
tags: |
|
- dino |
|
- vision |
|
--- |
|
[[Paper]](https://openreview.net/forum?id=e3scLKNiNg¬eId=e3scLKNiNg) [[GitHub]](https://github.com/fra31/perceptual-metrics) |
|
|
|
Robust perceptual metric, based on DINO model `facebook/dinov2-base`. |
|
Adversarially fine-tuned with FARE ([Schlarmann et al. (2024)](https://arxiv.org/abs/2402.12336)) on ImageNet with infinity-norm and radius 4/255. |
|
|
|
## Usage |
|
```python |
|
preprocessor = transforms.Compose([ |
|
transforms.Resize(256, interpolation=3), |
|
transforms.CenterCrop(224), |
|
transforms.ToTensor(), |
|
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)), |
|
]) |
|
model = AutoModel.from_pretrained("ch20/dinov2-base-fare4") |
|
``` |
|
|
|
## Citation |
|
If you find this model useful, please consider citing our papers: |
|
```bibtex |
|
@inproceedings{croce2024adversarially, |
|
title={Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics}, |
|
author={Croce, Francesco and Schlarmann, Christian and Singh, Naman Deep and Hein, Matthias}, |
|
year={2025}, |
|
booktitle={{SaTML}} |
|
} |
|
``` |
|
|
|
```bibtex |
|
@inproceedings{schlarmann2024robustclip, |
|
title={Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language |
|
Models}, |
|
author={Schlarmann, Christian and Singh, Naman Deep and Croce, Francesco and Hein, Matthias}, |
|
year={2024}, |
|
booktitle={{ICML}} |
|
} |
|
``` |