Upload README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ widget:
|
|
16 |
|
17 |
---
|
18 |
|
19 |
-
<h1 align="center">
|
20 |
|
21 |
This is a bilingual Polish-English text encoder based on [stella_en_1.5B_v5](https://huggingface.co/dunzhang/stella_en_1.5B_v5). We adapted the model for Polish with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) using a diverse corpus of 20 million Polish-English text pairs.
|
22 |
It transforms texts to 1024 dimensional vectors. For English texts, the produced embeddings should be similar to the original Stella model. The encoder can be used to compare embeddings in the same language (Polish or English), as well as across languages.
|
@@ -73,7 +73,7 @@ print(cos_sim(emb, emb))
|
|
73 |
|
74 |
## Evaluation Results
|
75 |
|
76 |
-
|
77 |
|
78 |
## Citation
|
79 |
|
|
|
16 |
|
17 |
---
|
18 |
|
19 |
+
<h1 align="center">Stella-PL</h1>
|
20 |
|
21 |
This is a bilingual Polish-English text encoder based on [stella_en_1.5B_v5](https://huggingface.co/dunzhang/stella_en_1.5B_v5). We adapted the model for Polish with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) using a diverse corpus of 20 million Polish-English text pairs.
|
22 |
It transforms texts to 1024 dimensional vectors. For English texts, the produced embeddings should be similar to the original Stella model. The encoder can be used to compare embeddings in the same language (Polish or English), as well as across languages.
|
|
|
73 |
|
74 |
## Evaluation Results
|
75 |
|
76 |
+
The model achieves **NDCG@10** of **60.52** on the Polish Information Retrieval Benchmark. See [PIRB Leaderboard](https://huggingface.co/spaces/sdadas/pirb) for detailed results.
|
77 |
|
78 |
## Citation
|
79 |
|