jinaai
/

jina-embeddings-v2-base-zh

Feature Extraction

sentence-transformers

Transformers.js

sentence-similarity

text-embeddings-inference

Model card Files Files and versions

bwang0911 commited on Jan 16, 2024

Commit

1fdda36

·

verified ·

1 Parent(s): 7f19d36

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -1082,6 +1082,10 @@ model-index:
 It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
 We have designed it for high performance in mongolingual & cross-language applications and trained it specifically to support mixed Chinese-English input without bias.
 The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi.
 This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.
@@ -1175,7 +1179,7 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
 ## Plans
 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
-2. Multimodal embedding models enable MultimodalRAG applications.
 3. High-performt rerankers.
 ## Contact

 It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
 We have designed it for high performance in mongolingual & cross-language applications and trained it specifically to support mixed Chinese-English input without bias.
+`jina-embeddings-v2-base-zh` 是支持中英双语的文本向量模型，它支持长达8192字符的文本编码。
+该模型的研发基于BERT架构(JinaBERT)，JinaBERT是在BERT架构基础上的改进，首次将[ALiBi](https://arxiv.org/abs/2108.12409)应用到编码器架构中以支持更长的序列。
+不同于以往的单语言/多语言向量模型，我们设计双语模型来更好的支持单语言（中搜中）以及跨语言（中搜英）文档检索。
 The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi.
 This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.
 ## Plans
 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
+2. Multimodal embedding models enable Multimodal RAG applications.
 3. High-performt rerankers.
 ## Contact