Update README.md
Browse files
README.md
CHANGED
|
@@ -15,9 +15,23 @@ license: apache-2.0
|
|
| 15 |
|
| 16 |
The text embedding suit trained by [Jina AI](https://github.com/jina-ai), [Finetuner team](https://github.com/jina-ai/finetuner).
|
| 17 |
|
| 18 |
-
## Intented Usage
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Data & Parameters
|
| 23 |
|
|
|
|
| 15 |
|
| 16 |
The text embedding suit trained by [Jina AI](https://github.com/jina-ai), [Finetuner team](https://github.com/jina-ai/finetuner).
|
| 17 |
|
| 18 |
+
## Intented Usage & Model Info
|
| 19 |
|
| 20 |
+
`jina-embedding-s-en-v1` is a language model that has been trained using Jina AI's Linnaeus-Clean dataset.
|
| 21 |
+
This dataset consists of 380 million pairs of sentences, which include both query-document pairs.
|
| 22 |
+
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
| 23 |
+
The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs.
|
| 24 |
+
|
| 25 |
+
The model has a range of use cases, including information retrieval, semantic textual similarity, text reranking, and more.
|
| 26 |
+
|
| 27 |
+
With a compact size of just 35 million parameters,
|
| 28 |
+
the model enables lightning-fast inference while still delivering impressive performance.
|
| 29 |
+
Additionally, we provide the following options:
|
| 30 |
+
|
| 31 |
+
- jina-embedding-b-en-v1: 110 million parameters.
|
| 32 |
+
- jina-embedding-l-en-v1: 800 million parameters.
|
| 33 |
+
- jina-embedding-xl-en-v1: 3 billion parameters.
|
| 34 |
+
- jina-embedding-xxl-en-v1: 11 billion parameters.
|
| 35 |
|
| 36 |
## Data & Parameters
|
| 37 |
|