Update README.md
Browse files
README.md
CHANGED
|
@@ -1082,6 +1082,10 @@ model-index:
|
|
| 1082 |
It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
| 1083 |
We have designed it for high performance in mongolingual & cross-language applications and trained it specifically to support mixed Chinese-English input without bias.
|
| 1084 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1085 |
The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi.
|
| 1086 |
This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.
|
| 1087 |
|
|
@@ -1175,7 +1179,7 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
|
|
| 1175 |
## Plans
|
| 1176 |
|
| 1177 |
1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
|
| 1178 |
-
2. Multimodal embedding models enable
|
| 1179 |
3. High-performt rerankers.
|
| 1180 |
|
| 1181 |
## Contact
|
|
|
|
| 1082 |
It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
| 1083 |
We have designed it for high performance in mongolingual & cross-language applications and trained it specifically to support mixed Chinese-English input without bias.
|
| 1084 |
|
| 1085 |
+
`jina-embeddings-v2-base-zh` 是支持中英双语的文本向量模型,它支持长达8192字符的文本编码。
|
| 1086 |
+
该模型的研发基于BERT架构(JinaBERT),JinaBERT是在BERT架构基础上的改进,首次将[ALiBi](https://arxiv.org/abs/2108.12409)应用到编码器架构中以支持更长的序列。
|
| 1087 |
+
不同于以往的单语言/多语言向量模型,我们设计双语模型来更好的支持单语言(中搜中)以及跨语言(中搜英)文档检索。
|
| 1088 |
+
|
| 1089 |
The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi.
|
| 1090 |
This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.
|
| 1091 |
|
|
|
|
| 1179 |
## Plans
|
| 1180 |
|
| 1181 |
1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
|
| 1182 |
+
2. Multimodal embedding models enable Multimodal RAG applications.
|
| 1183 |
3. High-performt rerankers.
|
| 1184 |
|
| 1185 |
## Contact
|