jupyterjazz commited on
Commit
b259b10
·
1 Parent(s): 8ccc3e7

update README

Browse files

Signed-off-by: jupyterjazz <[email protected]>

Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - vidore
5
+ - colpali
6
+ - multimodal-embedding
7
+ - multilingual-embedding
8
+ - Text-to-Visual Document (T→VD) retrieval
9
+ - feature-extraction
10
+ - sentence-similarity
11
+ - mteb
12
+ language:
13
+ - multilingual
14
+ library_name: transformers
15
+ pipeline_tag: visual-document-retrieval
16
+ ---
17
+ <br><br>
18
+
19
+ <p align="center">
20
+ <img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
21
+ </p>
22
+
23
+
24
+ <p align="center">
25
+ <b>The embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
26
+ </p>
27
+
28
+ # Jina Embeddings v4: Universal Embeddings for Multimodal Multilingual Retrieval
29
+
30
+
31
+ [Original Model](https://huggingface.co/jinaai/jina-embeddings-v4) | [Blog](https://jina.ai/news/jina-embeddings-v4-universal-embeddings-for-multimodal-multilingual-retrieval) | [Technical Report](https://arxiv.org/abs/2506.18902) | [API](https://jina.ai/embeddings)
32
+
33
+
34
+ ## Model Overview
35
+
36
+ This repository hosts a vLLM-compatible version of [`jina-embeddings-v4`](https://huggingface.co/jinaai/jina-embeddings-v4) with the code adapter merged into the base `Qwen2.5-VL` weights. This architecture modification enables native compatibility with vLLM without requiring custom adapter-handling code.
37
+
38
+
39
+ ## Usage
40
+
41
+ ```python
42
+ import torch
43
+ from PIL import Image
44
+
45
+ from vllm import LLM
46
+ from vllm.config import PoolerConfig
47
+ from vllm.inputs.data import TextPrompt
48
+
49
+ # Initialize model
50
+ model = LLM(
51
+ model="jinaai/jina-embeddings-v4-vllm-code",
52
+ task="embed",
53
+ enforce_eager=True,
54
+ override_pooler_config=PoolerConfig(pooling_type="ALL", normalize=False),
55
+ dtype="float16",
56
+ )
57
+
58
+ # Create text prompts
59
+ query =query = "Find a function that prints a greeting message to the console"
60
+ query_prompt = TextPrompt(
61
+ prompt=f"Query: {query}"
62
+ )
63
+
64
+ passage = "def hello_world():\n print('Hello, World!')"
65
+ passage_prompt = TextPrompt(
66
+ prompt=f"Passage: {passage}"
67
+ )
68
+
69
+ # Create image prompt
70
+ image = Image.open("<path_to_image>")
71
+ image_prompt = TextPrompt(
72
+ prompt="<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe the image.<|im_end|>\n",
73
+ multi_modal_data={"image": image},
74
+ )
75
+
76
+ # Encode all prompts
77
+ prompts = [query_prompt, passage_prompt, image_prompt]
78
+ outputs = model.encode(prompts)
79
+
80
+
81
+ def get_embeddings(outputs):
82
+ VISION_START_TOKEN_ID, VISION_END_TOKEN_ID = 151652, 151653
83
+
84
+ embeddings = []
85
+ for output in outputs:
86
+ if VISION_START_TOKEN_ID in output.prompt_token_ids:
87
+ # Gather only vision tokens
88
+ img_start_pos = torch.where(
89
+ torch.tensor(output.prompt_token_ids) == VISION_START_TOKEN_ID
90
+ )[0][0]
91
+ img_end_pos = torch.where(
92
+ torch.tensor(output.prompt_token_ids) == VISION_END_TOKEN_ID
93
+ )[0][0]
94
+ embeddings_tensor = output.outputs.data.detach().clone()[
95
+ img_start_pos : img_end_pos + 1
96
+ ]
97
+ else:
98
+ # Use all tokens for text-only prompts
99
+ embeddings_tensor = output.outputs.data.detach().clone()
100
+
101
+ # Pool and normalize embeddings
102
+ pooled_output = (
103
+ embeddings_tensor.sum(dim=0, dtype=torch.float32)
104
+ / embeddings_tensor.shape[0]
105
+ )
106
+ embeddings.append(torch.nn.functional.normalize(pooled_output, dim=-1))
107
+ return embeddings
108
+
109
+ embeddings = get_embeddings(outputs)
110
+ ```