spacemanidol commited on
Commit
edc2df7
·
verified ·
1 Parent(s): 41f336f

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -9076,7 +9076,9 @@ Key Features:
9076
 
9077
  3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.
9078
 
9079
- 4. Drop-In Replacement: arctic-embed-l-v2.0 builds on [XMLR-Large](https://huggingface.co/FacebookAI/xlm-roberta-large) which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc.
 
 
9080
 
9081
 
9082
  ### Quality Benchmarks
@@ -9151,10 +9153,10 @@ model.eval()
9151
  query_prefix = 'query: '
9152
  queries = ['what is snowflake?', 'Where can I get the best tacos?']
9153
  queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
9154
- query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=512)
9155
 
9156
  documents = ['The Data Cloud!', 'Mexico City of Course!']
9157
- document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=512)
9158
 
9159
  # Compute token embeddings
9160
  with torch.no_grad():
 
9076
 
9077
  3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.
9078
 
9079
+ 4. Drop-In Replacement: arctic-embed-l-v2.0 builds on BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc.
9080
+
9081
+ 5. Long Context Support: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which can support a context window of up to 8192 via the use of RoPE.
9082
 
9083
 
9084
  ### Quality Benchmarks
 
9153
  query_prefix = 'query: '
9154
  queries = ['what is snowflake?', 'Where can I get the best tacos?']
9155
  queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
9156
+ query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=8192)
9157
 
9158
  documents = ['The Data Cloud!', 'Mexico City of Course!']
9159
+ document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=8192)
9160
 
9161
  # Compute token embeddings
9162
  with torch.no_grad():