aimagelab
/

safeclip_vit-l_14

@@ -3,45 +3,48 @@ library_name: transformers
 license: cc-by-nc-4.0
 ---
-# Model Card: Safe-CLIP ViT-L-14
 Safe-CLIP, introduced in the paper [**Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models**](https://arxiv.org/abs/2311.16254), is an ehnanced vision-and-language model designed to mitigate the risks associated with NSFW (Not Safe For Work) content in AI applications.
 Based on the CLIP model, Safe-CLIP is fine-tuned to serve the association between linguistic and visual concepts, ensuring safer outputs in text-to-image and image-to-text retrieval and generation tasks.
-## Model Details
-### Model Description
-Safe-CLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) vision-and-language model. The model fine-tuning is done through the ViSU (Visual Safe and Unsafe) Dataset, introduced in the same [paper](https://arxiv.org/abs/2311.16254).
-ViSU contains quadruplets of elements: safe texts, safe images, NSFW texts, NSFW images.
-![Safe-CLIP applied to downstream tasks](https://huggingface.co/aimagelab/safeclip_vit-l_14/blob/main/safe-CLIP_tasks.jpg)
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use

 license: cc-by-nc-4.0
 ---
+# Model Card: Safe-CLIP
 Safe-CLIP, introduced in the paper [**Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models**](https://arxiv.org/abs/2311.16254), is an ehnanced vision-and-language model designed to mitigate the risks associated with NSFW (Not Safe For Work) content in AI applications.
 Based on the CLIP model, Safe-CLIP is fine-tuned to serve the association between linguistic and visual concepts, ensuring safer outputs in text-to-image and image-to-text retrieval and generation tasks.
+### Model Details
+Safe-CLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) vision-and-language model. The model fine-tuning is done through the ViSU (Visual Safe and Unsafe) Dataset, introduced in the same [paper](https://arxiv.org/abs/2311.16254).
+ViSU contains quadruplets of elements: safe texts, safe images, NSFW texts, NSFW images. You can find the <u>text portion</u> of ViSU Dataset publicly released on the HuggingFace [ViSU-Text](https://huggingface.co/datasets/aimagelab/ViSU-Text) page. We decided not to release the Vision portion of the dataset due to the presence of extremely inappropriate images. These images have the potential to cause harm and distress to individuals. Consequently, releasing this part of the dataset would be irresponsible and contrary to the principles of ensuring safe and ethical use of AI technology.
+**Variations** Safe-CLIP comes in four versions to improve the compatibility across some of the most popular vision-and-language models employed for I2T and T2I generation tasks. More details are reported in the next table.
+|                          | StableDiffusion compatibility | LLaVA compatibility |
+|--------------------------|:-----------------------------:|:-------------------:|
+| safe-CLIP ViT-L-14       |              1.4              |          ?          |
+| safe-CLIP ViT-L-14-336px |               -               |       1.5 1.6       |
+| safe-CLIP ViT-H-14       |               -               |          -          |
+| safe-CLIP SD 2.0         |              2.0              |          -          |
+**Model Release Date** July 2024.
+## How to use
+### Use with Transformers
+See the snippet below for usage with Transformers:
+```python
+>>> from transformers import CLIPModel
+>>> model_id = "aimagelab/safeclip_vit-l_14"
+>>> model = CLIPModel.from_pretrained(model_id)
+>>> image = "https://huggingface.co/front/assets/huggingface.png"
+>>> texts = ["a photo of a cat", "a photo of a dog"]
+```
 ### Direct Use