aimagelab
/

safeclip_vit-l_14

zero-shot-image-classification

vision-and-language

nsfw

Model card Files Files and versions Community

tobi1modna commited on Jul 9, 2024

Commit

46340f6

·

verified ·

1 Parent(s): c80add4

Update README.md

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -1,19 +1,25 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

 ---
 library_name: transformers
+license: cc-by-nc-4.0
 ---
+# Model Card: Safe-CLIP ViT-L-14
+Safe-CLIP, introduced in the paper [**Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models**](https://arxiv.org/abs/2311.16254), is an ehnanced vision-and-language model designed to mitigate the risks associated with NSFW (Not Safe For Work) content in AI applications.
+Based on the CLIP model, Safe-CLIP is fine-tuned to serve the association between linguistic and visual concepts, ensuring safer outputs in text-to-image and image-to-text retrieval and generation tasks.
 ## Model Details
 ### Model Description
+Safe-CLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) vision-and-language model. The model fine-tuning is done through the ViSU (Visual Safe and Unsafe) Dataset, introduced in the same [paper](https://arxiv.org/abs/2311.16254).
+ViSU contains quadruplets of elements: safe texts, safe images, NSFW texts, NSFW images.
+![Safe-CLIP applied to downstream tasks](https://github.com/aimagelab/safe-clip/blob/main/imgs/safeCLIP_tasks.png)
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.