Update README.md
Browse files
README.md
CHANGED
@@ -60,7 +60,6 @@ language:
|
|
60 |
- my
|
61 |
- ne
|
62 |
- nl
|
63 |
-
- 'no'
|
64 |
- pa
|
65 |
- pl
|
66 |
- pt
|
@@ -110,46 +109,50 @@ Arctic Embed 2.0 introduces a new standard for multilingual embedding models, co
|
|
110 |
Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.
|
111 |
|
112 |
Key Features:
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
|
|
|
|
|
|
|
|
117 |
|
118 |
|
119 |
### Quality Benchmarks
|
120 |
Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF).
|
121 |
-
You no longer need to support models to empower high-quality English and multilingual retrieval.
|
122 |
|
123 |
| Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
|
124 |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
125 |
-
| me5 base | 560M | 303M | 1024 |
|
126 |
-
| bge-m3 (BAAI) | 568M | 303M | 1024 |
|
127 |
-
| gte (Alibaba) | 305M | 113M | 768 |
|
128 |
-
| Arctic-M
|
129 |
-
| snowflake-arctic-m | 335M | 303M | 1024 | 0
|
130 |
-
| me5 base | 560M | 303M | 1024 |
|
131 |
-
| bge-m3 (BAAI) | 568M | 303M | 1024 |
|
132 |
-
| gte (Alibaba) | 305M | 113M | 768 |
|
133 |
-
| snowflake-arctic-m | 109M | 86M | 768 |
|
134 |
-
| snowflake-arctic-l | 335M | 303M | 1024 | 0
|
135 |
-
| snowflake-arctic-m-v2.0 | 305M | 113M | 768 |
|
136 |
-
| snowflake-arctic-l-v2.0 | 568M | 303M | 1024 |
|
137 |
|
138 |
Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 3-4x with less than 3% degredation in quality.
|
139 |
Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.
|
140 |
|
141 |
| Model | | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
|
142 |
|---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
|
143 |
-
| snowflake-arctic-l-v2.0 | 1024 |
|
144 |
-
| snowflake-arctic-l-v2.0 | 256 |
|
145 |
-
| snowflake-arctic-m-v2.0 | 768 |
|
146 |
-
| snowflake-arctic-m-v2.0 | 256 |
|
147 |
|
148 |
## Usage
|
149 |
|
150 |
### Using Sentence Transformers
|
151 |
|
152 |
-
|
153 |
from sentence_transformers import SentenceTransformer
|
154 |
|
155 |
# Load the model
|
@@ -176,6 +179,7 @@ for query, query_scores in zip(queries, scores):
|
|
176 |
print(score, document)
|
177 |
|
178 |
```
|
|
|
179 |
### Using Huggingface Transformers
|
180 |
|
181 |
|
|
|
60 |
- my
|
61 |
- ne
|
62 |
- nl
|
|
|
63 |
- pa
|
64 |
- pl
|
65 |
- pt
|
|
|
109 |
Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.
|
110 |
|
111 |
Key Features:
|
112 |
+
|
113 |
+
1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.
|
114 |
+
|
115 |
+
2. Inference efficiency: With its 300m non-embedding parameters inference is fast and efficient for any scale.
|
116 |
+
|
117 |
+
3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.
|
118 |
+
|
119 |
+
4. Drop-In Replacement: arctic-embed-l-v2.0 builds on [XMLR-Large](https://huggingface.co/FacebookAI/xlm-roberta-large) which allows direct drop-in inference replacement with any form of new libraries, kernels, inferene engines etc.
|
120 |
|
121 |
|
122 |
### Quality Benchmarks
|
123 |
Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF).
|
124 |
+
You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed.
|
125 |
|
126 |
| Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
|
127 |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
128 |
+
| me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 |
|
129 |
+
| bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | 56.8 | 40.8 | 41.3 |
|
130 |
+
| gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 |
|
131 |
+
| Arctic-M (v1.0) | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 |
|
132 |
+
| snowflake-arctic-m | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 |
|
133 |
+
| me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 |
|
134 |
+
| bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | 56.8 | 40.8 | 41.3 |
|
135 |
+
| gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 |
|
136 |
+
| snowflake-arctic-m | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 |
|
137 |
+
| snowflake-arctic-l | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 |
|
138 |
+
| snowflake-arctic-m-v2.0 | 305M | 113M | 768 | 55.4 | 55.2 | 51.7 | 53.9 |
|
139 |
+
| **snowflake-arctic-l-v2.0** | 568M | 303M | 1024 | 55.6 | 55.8 | 52.9 | **54.3** |
|
140 |
|
141 |
Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 3-4x with less than 3% degredation in quality.
|
142 |
Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.
|
143 |
|
144 |
| Model | | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
|
145 |
|---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
|
146 |
+
| snowflake-arctic-l-v2.0 | 1024 | 55.6 | N/A | 55.8 | N/A | 52.9 | N/A | 54.3 | N/A |
|
147 |
+
| snowflake-arctic-l-v2.0 | 256 | 54.3 | -0.18% | 54.3 | -2.70% | 0.519 | -1.81% | 53.4 | -1.53% |
|
148 |
+
| snowflake-arctic-m-v2.0 | 768 | 55.4 | N/A | 55.2 | N/A | 51.7 | N/A | 53.9 | N/A |
|
149 |
+
| snowflake-arctic-m-v2.0 | 256 | 54.4 | -1.81% | 54.0 | -2.17% | 50.6 | -2.13% | 52.3 | -3.06% |
|
150 |
|
151 |
## Usage
|
152 |
|
153 |
### Using Sentence Transformers
|
154 |
|
155 |
+
```python
|
156 |
from sentence_transformers import SentenceTransformer
|
157 |
|
158 |
# Load the model
|
|
|
179 |
print(score, document)
|
180 |
|
181 |
```
|
182 |
+
|
183 |
### Using Huggingface Transformers
|
184 |
|
185 |
|