Snowflake
/

snowflake-arctic-embed-l-v2.0

@@ -60,7 +60,6 @@ language:
 - my
 - ne
 - nl
-- 'no'
 - pa
 - pl
 - pt
@@ -110,46 +109,50 @@ Arctic Embed 2.0 introduces a new standard for multilingual embedding models, co
 Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.
 Key Features:
-Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.
-Inference efficiency: With its 300m non-embedding parameters inference is fast and efficient for any scale.
-Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.
-Drop-In Replacement: arctic-embed-l-v2.0 builds on [XMLR-Large](https://huggingface.co/FacebookAI/xlm-roberta-large) which allows direct drop-in inference replacement with any form of new libraries, kernels, inferene engines etc.
 ### Quality Benchmarks
 Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF).
-You no longer need to support models to empower high-quality English and multilingual retrieval.
 | Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
 |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-| me5 base | 560M | 303M | 1024 | 0.514 | 0.540 | 0.430 | 0.346 |
-| bge-m3 (BAAI) | 568M | 303M | 1024 | 0.488 | 0.568 | 0.408 | 0.413 |
-| gte (Alibaba) | 305M | 113M | 768 | 0.511 | 0.523 | 0.477 | 0.531 |
-| Arctic-M  | 109M | 86M | 768 | 0.549 | 0.249 | 0.344 | 0.291 |
-| snowflake-arctic-m | 335M | 303M | 1024 | 0.560 | 0.348 | 0.382 | 0.337 |
-| me5 base | 560M | 303M | 1024 | 0.514 | 0.540 | 0.430 | 0.346 |
-| bge-m3 (BAAI) | 568M | 303M | 1024 | 0.488 | 0.568 | 0.408 | 0.413 |
-| gte (Alibaba) | 305M | 113M | 768 | 0.511 | 0.523 | 0.477 | 0.531 |
-| snowflake-arctic-m | 109M | 86M | 768 | 0.549 | 0.249 | 0.344 | 0.291 |
-| snowflake-arctic-l | 335M | 303M | 1024 | 0.560 | 0.348 | 0.382 | 0.337 |
-| snowflake-arctic-m-v2.0 | 305M | 113M | 768 | 0.554 | 0.552 | 0.517 | 0.539 |
-| snowflake-arctic-l-v2.0 | 568M | 303M | 1024 | 0.556 | 0.558 | 0.529 | 0.543 |
 Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 3-4x with less than 3% degredation in quality.
 Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.
 | Model |  | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
 |---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
-| snowflake-arctic-l-v2.0 | 1024 | 0.556 | N/A | 0.558 | N/A | 0.529 | N/A | 0.543 | N/A |
-| snowflake-arctic-l-v2.0 | 256 | 0.543 | -0.18% | 0.543 | -2.70% | 0.519 | -1.81% | 0.534 | -1.53% |
-| snowflake-arctic-m-v2.0 | 768 | 0.554 | N/A | 0.552 | N/A | 0.517 | N/A | 0.539 | N/A |
-| snowflake-arctic-m-v2.0 | 256 | 0.544 | -1.81% | 0.54 | -2.17% | 0.506 | -2.13% | 0.523 | -3.06% |
 ## Usage
 ### Using Sentence Transformers
-``
 from sentence_transformers import SentenceTransformer
 # Load the model
@@ -176,6 +179,7 @@ for query, query_scores in zip(queries, scores):
         print(score, document)
 ```
 ### Using Huggingface Transformers

 - my
 - ne
 - nl
 - pa
 - pl
 - pt
 Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.
 Key Features:
+1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.
+2. Inference efficiency: With its 300m non-embedding parameters inference is fast and efficient for any scale.
+3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.
+4. Drop-In Replacement: arctic-embed-l-v2.0 builds on [XMLR-Large](https://huggingface.co/FacebookAI/xlm-roberta-large) which allows direct drop-in inference replacement with any form of new libraries, kernels, inferene engines etc.
 ### Quality Benchmarks
 Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF).
+You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed.
 | Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
 |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 |
+| bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | 56.8 | 40.8 | 41.3 |
+| gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 |
+| Arctic-M (v1.0) | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 |
+| snowflake-arctic-m | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 |
+| me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 |
+| bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | 56.8 | 40.8 | 41.3 |
+| gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 |
+| snowflake-arctic-m | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 |
+| snowflake-arctic-l | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 |
+| snowflake-arctic-m-v2.0 | 305M | 113M | 768 | 55.4 | 55.2 | 51.7 | 53.9 |
+| **snowflake-arctic-l-v2.0** | 568M | 303M | 1024 | 55.6 | 55.8 | 52.9 | **54.3** |
 Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 3-4x with less than 3% degredation in quality.
 Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.
 | Model |  | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
 |---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
+| snowflake-arctic-l-v2.0 | 1024 | 55.6 | N/A | 55.8 | N/A | 52.9 | N/A | 54.3 | N/A |
+| snowflake-arctic-l-v2.0 | 256 | 54.3 | -0.18% | 54.3 | -2.70% | 0.519 | -1.81% | 53.4 | -1.53% |
+| snowflake-arctic-m-v2.0 | 768 | 55.4 | N/A | 55.2 | N/A | 51.7 | N/A | 53.9 | N/A |
+| snowflake-arctic-m-v2.0 | 256 | 54.4 | -1.81% | 54.0 | -2.17% | 50.6 | -2.13% | 52.3 | -3.06% |
 ## Usage
 ### Using Sentence Transformers
+```python
 from sentence_transformers import SentenceTransformer
 # Load the model
         print(score, document)
 ```
 ### Using Huggingface Transformers