Snowflake
/

snowflake-arctic-embed-l-v2.0

@@ -89,6 +89,7 @@ language:
 <h1 align="center">Snowflake's Arctic-embed-l-v2.0</h1>
 <h4 align="center">
    <p>
        <a href=#models>Models</a> |
        <a href=#usage>Usage</a>  |
        <a href="#evaluation">Evaluation</a> |
@@ -100,16 +101,24 @@ language:
 </h4>
 ## Models
-MIRACL (4)	Voyage misc. (9)	CLEF (5)	CLEF, max context length	Multilingual CLEF
-Snowflake's snowflake-arctic-embed-l-v2.0 is a multilingual text embedding models that focuses on providing
-BEIR
-0.556	0.558	0.655	0.529	0.541	0.543
-0.543	0.543	0.644	0.519	0.528	0.534
-Focused on
 | Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
 |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
@@ -126,7 +135,8 @@ Focused on
 | snowflake-arctic-m-v2.0 | 305M | 113M | 768 | 0.554 | 0.552 | 0.517 | 0.539 |
 | snowflake-arctic-l-v2.0 | 568M | 303M | 1024 | 0.556 | 0.558 | 0.529 | 0.543 |
-MRL
 | Model |  | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
 |---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
@@ -135,30 +145,41 @@ MRL
 | snowflake-arctic-m-v2.0 | 768 | 0.554 | N/A | 0.552 | N/A | 0.517 | N/A | 0.539 | N/A |
 | snowflake-arctic-m-v2.0 | 256 | 0.544 | -1.81% | 0.54 | -2.17% | 0.506 | -2.13% | 0.523 | -3.06% |
-The `snowflake-arctic-embedding` models achieve **state-of-the-art performance on the MTEB/BEIR leaderboard** for each of their size variants. Evaluation is performed using these [scripts](https://github.com/Snowflake-Labs/snowflake-arctic-embed/tree/main/src). As shown below, each class of model size achieves SOTA retrieval accuracy compared to other top models.
-The models are trained by leveraging existing open-source text representation models, such as bert-base-uncased, and are trained in a multi-stage pipeline to optimize their retrieval performance. First, the models are trained with large batches of query-document pairs where negatives are derived in-batch—pretraining leverages about 400m samples of a mix of public datasets and proprietary web search data. Following pretraining models are further optimized with long training on a smaller dataset (about 1m samples) of triplets of query, positive document, and negative document derived from hard harmful mining. Mining of the negatives and data curation is crucial to retrieval accuracy. A detailed technical report can be found [here](https://arxiv.org/abs/2405.05374).
-| Name                                                                    | MTEB Retrieval Score (NDCG @ 10) | Parameters (Millions) | Embedding Dimension |
-| ----------------------------------------------------------------------- | -------------------------------- | --------------------- | ------------------- |
-| [snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs/)     | 50.15                            | 22                    | 384                 |
-| [snowflake-arctic-embed-s](https://huggingface.co/Snowflake/snowflake-arctic-embed-s/)      | 51.98                            | 33                    | 384                 |
-| [snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m/)      | 54.90                            | 110                   | 768                 |
-| [snowflake-arctic-embed-m-long](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long/) | 54.83                            | 137                   | 768                 |
-| [snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l/)      | 55.98                            | 335                   | 1024                |
-## Usage
-### Using Huggingface transformers
-You can use the transformers package to use an snowflake-arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query).
 ```python
 import torch
@@ -169,7 +190,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModel.from_pretrained(model_name, add_pooling_layer=False)
 model.eval()
-query_prefix = 'Represent this sentence for searching relevant passages: '
 queries  = ['what is snowflake?', 'Where can I get the best tacos?']
 queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
 query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=512)

 <h1 align="center">Snowflake's Arctic-embed-l-v2.0</h1>
 <h4 align="center">
    <p>
+       <a href=#news>News</a> |
        <a href=#models>Models</a> |
        <a href=#usage>Usage</a>  |
        <a href="#evaluation">Evaluation</a> |
 </h4>
+## News
+12/04/2024: Release of `snowflake-arctic-embed-m-v2.0` and `snowflake-arctic-embed-m-v2.0` our newest models with multilingual workloads in mind.
 ## Models
+Snowflake arctic-embed-l-v2.0 is the newest addition to the suite of embedding models Snowflake has released optimizing for retrieval performance and inference efficiency.
+Arctic Embed 2.0 introduces a new standard for multilingual embedding models, combining high-quality multilingual text retrieval without sacrificing performance in English.
+Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.
+Key Features:
+Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.
+Inference efficiency: With its 300m non-embedding parameters inference is fast and efficient for any scale.
+Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.
+Drop-In Replacement: arctic-embed-l-v2.0 builds on [XMLR-Large](https://huggingface.co/FacebookAI/xlm-roberta-large) which allows direct drop-in inference replacement with any form of new libraries, kernels, inferene engines etc.
+### Quality Benchmarks
+Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF).
+You no longer need to support models to empower high-quality English and multilingual retrieval.
 | Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
 |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
 | snowflake-arctic-m-v2.0 | 305M | 113M | 768 | 0.554 | 0.552 | 0.517 | 0.539 |
 | snowflake-arctic-l-v2.0 | 568M | 303M | 1024 | 0.556 | 0.558 | 0.529 | 0.543 |
+Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 3-4x with less than 3% degredation in quality.
+Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.
 | Model |  | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
 |---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
 | snowflake-arctic-m-v2.0 | 768 | 0.554 | N/A | 0.552 | N/A | 0.517 | N/A | 0.539 | N/A |
 | snowflake-arctic-m-v2.0 | 256 | 0.544 | -1.81% | 0.54 | -2.17% | 0.506 | -2.13% | 0.523 | -3.06% |
+## Usage
+### Using Sentence Transformers
+``
+from sentence_transformers import SentenceTransformer
+# Load the model
+model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
+model = SentenceTransformer(model_name)
+# Define the queries and documents
+queries = ['what is snowflake?', 'Where can I get the best tacos?']
+documents = ['The Data Cloud!', 'Mexico City of Course!']
+# Compute embeddings: use `prompt_name="query"` to encode queries!
+query_embeddings = model.encode(queries, prompt_name="query")
+document_embeddings = model.encode(documents)
+# Compute cosine similarity scores
+scores = model.similarity(query_embeddings, document_embeddings)
+# Output the results
+for query, query_scores in zip(queries, scores):
+    doc_score_pairs = list(zip(documents, query_scores))
+    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
+    print("Query:", query)
+    for document, score in doc_score_pairs:
+        print(score, document)
+```
+### Using Huggingface Transformers
+You can use the transformers package to use Snowflake's arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query).
 ```python
 import torch
 model = AutoModel.from_pretrained(model_name, add_pooling_layer=False)
 model.eval()
+query_prefix = 'Query: '
 queries  = ['what is snowflake?', 'Where can I get the best tacos?']
 queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
 query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=512)