JonaRuthardt commited on
Commit
1a43751
Β·
verified Β·
1 Parent(s): 5a178fd

Updated README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -3
README.md CHANGED
@@ -1,3 +1,88 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - meta-llama/Meta-Llama-3-8B
5
+ - facebook/dinov2-base
6
+ ---
7
+
8
+ # ShareLock: Ultra-Lightweight CLIP-like Vision-Language Model
9
+
10
+ Welcome to the Hugging Face repository for **ShareLock**, an ultra-lightweight CLIP-like vision-language model. This repository hosts pretrained checkpoints for ShareLock, enabling easy integration into your projects.
11
+
12
+ ShareLock is introduced in the paper:
13
+ **"Do Better Language Models Have Crisper Vision?"**
14
+ *[Jona Ruthardt](https://jonaruthardt.github.io), [Gertjan J. Burghouts](https://gertjanburghouts.github.io), [Serge Belongie](https://sergebelongie.github.io), [Yuki M. Asano](yukimasano.github.io)*
15
+
16
+ 🌐 **[Project Page](https://jonaruthardt.github.io/projects/ShareLock/)**
17
+ ⌨️ **[GitHub Repository](https://github.com/JonaRuthardt/ShareLock)**
18
+ πŸ“„ **[Read the Paper on arXiv](https://arxiv.org/abs/2410.07173)**
19
+
20
+ ---
21
+
22
+ ## 🧠 Model Overview
23
+
24
+ **ShareLock** combines strong frozen features from unimodal vision and language models to achieve competitive multimodal performance with minimal resources.
25
+
26
+ ### Key Highlights:
27
+ - **Ultra-Lightweight:** ShareLock is trained on only 563k image-caption pairs, requiring just 1 GPU hour.
28
+ - **Efficient Performance:** Achieves 51% zero-shot accuracy on ImageNet.
29
+ - **Plug-and-Play:** Easily integrates into downstream vision-language tasks.
30
+
31
+ ---
32
+
33
+ ## πŸ“‚ Available Checkpoints
34
+
35
+ ### Model Variants:
36
+ 1. **ShareLock trained on CC3M**
37
+ 2. **ShareLock trained on CC12M**
38
+
39
+ ---
40
+
41
+ ## πŸš€ Usage
42
+
43
+ You can load ShareLock models using the `ShareLock` class directly for inference or fine-tuning:
44
+
45
+ ### Example: Zero-shot Classification
46
+ ```python
47
+ from sharelock.models.model import ShareLock
48
+
49
+ # Path to the checkpoint
50
+ checkpoint_path = "path/to/checkpoint.ckpt"
51
+ config = {
52
+ # Add your configuration for model hyperparameters etc. here
53
+ }
54
+
55
+ # Load the ShareLock model
56
+ model = ShareLock.load_from_checkpoint(checkpoint_path, config=config)
57
+
58
+ # Encode text and images for multimodal tasks
59
+ image_embeddings = model.encode_image(your_image_tensor)
60
+ text_embeddings = model.encode_text(["a cat", "a dog"])
61
+
62
+ # Perform multimodal operations
63
+ ```
64
+
65
+ ---
66
+
67
+ ## πŸ› οΈ Details
68
+ For training scripts, evaluation, or further implementation details, visit our [GitHub repository](https://github.com/JonaRuthardt/ShareLock)
69
+
70
+ ---
71
+
72
+ ## πŸ“œ Citation
73
+
74
+ If you use ShareLock in your research, please cite:
75
+ ```bibtex
76
+ @article{ruthardt2024sharelock,
77
+ title={Do Better Language Models Have Crisper Vision?},
78
+ author={Jona Ruthardt and Gertjan J. Burghouts and Serge Belongie and Yuki M. Asano},
79
+ journal={arXiv preprint arXiv:2410.07173},
80
+ year={2024}
81
+ }
82
+ ```
83
+
84
+ ---
85
+
86
+ ## πŸ“§ Contact
87
+
88
+ For any questions or collaborations, feel free to reach out to [Jona Ruthardt](mailto:[email protected]).