tobi1modna commited on
Commit
412d1a7
·
verified ·
1 Parent(s): 88ce351

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -21
README.md CHANGED
@@ -3,45 +3,48 @@ library_name: transformers
3
  license: cc-by-nc-4.0
4
  ---
5
 
6
- # Model Card: Safe-CLIP ViT-L-14
7
 
8
  Safe-CLIP, introduced in the paper [**Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models**](https://arxiv.org/abs/2311.16254), is an ehnanced vision-and-language model designed to mitigate the risks associated with NSFW (Not Safe For Work) content in AI applications.
9
 
10
  Based on the CLIP model, Safe-CLIP is fine-tuned to serve the association between linguistic and visual concepts, ensuring safer outputs in text-to-image and image-to-text retrieval and generation tasks.
11
 
 
12
 
13
- ## Model Details
14
 
15
- ### Model Description
16
 
17
- Safe-CLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) vision-and-language model. The model fine-tuning is done through the ViSU (Visual Safe and Unsafe) Dataset, introduced in the same [paper](https://arxiv.org/abs/2311.16254).
 
 
 
 
 
 
 
 
 
18
 
19
- ViSU contains quadruplets of elements: safe texts, safe images, NSFW texts, NSFW images.
20
 
21
- ![Safe-CLIP applied to downstream tasks](https://huggingface.co/aimagelab/safeclip_vit-l_14/blob/main/safe-CLIP_tasks.jpg)
22
 
23
 
24
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
25
 
26
- - **Developed by:** [More Information Needed]
27
- - **Funded by [optional]:** [More Information Needed]
28
- - **Shared by [optional]:** [More Information Needed]
29
- - **Model type:** [More Information Needed]
30
- - **Language(s) (NLP):** [More Information Needed]
31
- - **License:** [More Information Needed]
32
- - **Finetuned from model [optional]:** [More Information Needed]
33
 
34
- ### Model Sources [optional]
 
35
 
36
- <!-- Provide the basic links for the model. -->
37
 
38
- - **Repository:** [More Information Needed]
39
- - **Paper [optional]:** [More Information Needed]
40
- - **Demo [optional]:** [More Information Needed]
41
 
42
- ## Uses
 
 
43
 
44
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
45
 
46
  ### Direct Use
47
 
 
3
  license: cc-by-nc-4.0
4
  ---
5
 
6
+ # Model Card: Safe-CLIP
7
 
8
  Safe-CLIP, introduced in the paper [**Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models**](https://arxiv.org/abs/2311.16254), is an ehnanced vision-and-language model designed to mitigate the risks associated with NSFW (Not Safe For Work) content in AI applications.
9
 
10
  Based on the CLIP model, Safe-CLIP is fine-tuned to serve the association between linguistic and visual concepts, ensuring safer outputs in text-to-image and image-to-text retrieval and generation tasks.
11
 
12
+ ### Model Details
13
 
14
+ Safe-CLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) vision-and-language model. The model fine-tuning is done through the ViSU (Visual Safe and Unsafe) Dataset, introduced in the same [paper](https://arxiv.org/abs/2311.16254).
15
 
16
+ ViSU contains quadruplets of elements: safe texts, safe images, NSFW texts, NSFW images. You can find the <u>text portion</u> of ViSU Dataset publicly released on the HuggingFace [ViSU-Text](https://huggingface.co/datasets/aimagelab/ViSU-Text) page. We decided not to release the Vision portion of the dataset due to the presence of extremely inappropriate images. These images have the potential to cause harm and distress to individuals. Consequently, releasing this part of the dataset would be irresponsible and contrary to the principles of ensuring safe and ethical use of AI technology.
17
 
18
+ **Variations** Safe-CLIP comes in four versions to improve the compatibility across some of the most popular vision-and-language models employed for I2T and T2I generation tasks. More details are reported in the next table.
19
+
20
+ | | StableDiffusion compatibility | LLaVA compatibility |
21
+ |--------------------------|:-----------------------------:|:-------------------:|
22
+ | safe-CLIP ViT-L-14 | 1.4 | ? |
23
+ | safe-CLIP ViT-L-14-336px | - | 1.5 1.6 |
24
+ | safe-CLIP ViT-H-14 | - | - |
25
+ | safe-CLIP SD 2.0 | 2.0 | - |
26
+
27
+ **Model Release Date** July 2024.
28
 
 
29
 
30
+ ## How to use
31
 
32
 
33
+ ### Use with Transformers
34
 
35
+ See the snippet below for usage with Transformers:
 
 
 
 
 
 
36
 
37
+ ```python
38
+ >>> from transformers import CLIPModel
39
 
40
+ >>> model_id = "aimagelab/safeclip_vit-l_14"
41
 
42
+ >>> model = CLIPModel.from_pretrained(model_id)
 
 
43
 
44
+ >>> image = "https://huggingface.co/front/assets/huggingface.png"
45
+ >>> texts = ["a photo of a cat", "a photo of a dog"]
46
+ ```
47
 
 
48
 
49
  ### Direct Use
50