Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
- sentence-embedding
|
10 |
- mteb
|
11 |
model-index:
|
12 |
-
- name: bilingual-embedding
|
13 |
results:
|
14 |
- task:
|
15 |
type: Clustering
|
@@ -1527,9 +1527,9 @@ metrics:
|
|
1527 |
- spearmanr
|
1528 |
---
|
1529 |
|
1530 |
-
# [bilingual-embedding
|
1531 |
|
1532 |
-
bilingual-embedding is the Embedding Model for bilingual language: french and english. This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of [BGE M3](https://huggingface.co/BAAI/bge-m3), a pre-trained language model larged on the [BGE M3](https://huggingface.co/BAAI/bge-m3) architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.
|
1533 |
|
1534 |
|
1535 |
## Full Model Architecture
|
@@ -1568,7 +1568,7 @@ from sentence_transformers import SentenceTransformer
|
|
1568 |
|
1569 |
sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
|
1570 |
|
1571 |
-
model = SentenceTransformer('Lajavaness/bilingual-embedding
|
1572 |
print(embeddings)
|
1573 |
|
1574 |
```
|
|
|
9 |
- sentence-embedding
|
10 |
- mteb
|
11 |
model-index:
|
12 |
+
- name: bilingual-document-embedding
|
13 |
results:
|
14 |
- task:
|
15 |
type: Clustering
|
|
|
1527 |
- spearmanr
|
1528 |
---
|
1529 |
|
1530 |
+
# [bilingual-document-embedding](https://huggingface.co/Lajavaness/bilingual-document-embedding)
|
1531 |
|
1532 |
+
bilingual-document-embedding is the Embedding Model for document in bilingual language: french and english with context length up to 8096 tokens . This model is a specialized sentence-embedding trained specifically for the bilingual language, leveraging the robust capabilities of [BGE M3](https://huggingface.co/BAAI/bge-m3), a pre-trained language model larged on the [BGE M3](https://huggingface.co/BAAI/bge-m3) architecture. The model utilizes xlm-roberta to encode english-french sentences into a 1024-dimensional vector space, facilitating a wide range of applications from semantic search to text clustering. The embeddings capture the nuanced meanings of english-french sentences, reflecting both the lexical and contextual layers of the language.
|
1533 |
|
1534 |
|
1535 |
## Full Model Architecture
|
|
|
1568 |
|
1569 |
sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
|
1570 |
|
1571 |
+
model = SentenceTransformer('Lajavaness/bilingual-document-embedding', trust_remote_code=True)
|
1572 |
print(embeddings)
|
1573 |
|
1574 |
```
|