nielsr HF staff commited on
Commit
6cffdc6
·
1 Parent(s): 415aa2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -4,13 +4,13 @@ license: apache-2.0
4
 
5
  # SigLIP (base-sized model)
6
 
7
- SigLIP model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
8
 
9
- Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team.
10
 
11
  ## Model description
12
 
13
- SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
14
 
15
  ## Intended uses & limitations
16
 
@@ -27,8 +27,8 @@ import requests
27
  from transformers import AutoProcessor, AutoModel
28
  import torch
29
 
30
- model = AutoModel.from_pretrained("nielsr/siglip-base-patch16-224")
31
- processor = AutoProcessor.from_pretrained("nielsr/siglip-base-patch16-224")
32
 
33
  url = "http://images.cocodataset.org/val2017/000000039769.jpg"
34
  image = Image.open(requests.get(url, stream=True).raw)
 
4
 
5
  # SigLIP (base-sized model)
6
 
7
+ SigLIP model pre-trained on WebLi at resolution 256x256. It was introduced in the paper [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) by Zhai et al. and first released in [this repository](https://github.com/google-research/big_vision).
8
 
9
+ Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
10
 
11
  ## Model description
12
 
13
+ SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
14
 
15
  ## Intended uses & limitations
16
 
 
27
  from transformers import AutoProcessor, AutoModel
28
  import torch
29
 
30
+ model = AutoModel.from_pretrained("google/siglip-base-patch16-256")
31
+ processor = AutoProcessor.from_pretrained("google/siglip-base-patch16-256")
32
 
33
  url = "http://images.cocodataset.org/val2017/000000039769.jpg"
34
  image = Image.open(requests.get(url, stream=True).raw)