Update README.md
Browse files
README.md
CHANGED
@@ -84,7 +84,7 @@ model-index:
|
|
84 |
name: LB<->DE accuracy
|
85 |
license: agpl-3.0
|
86 |
datasets:
|
87 |
-
-
|
88 |
- fredxlpy/LuxAlign
|
89 |
language:
|
90 |
- lb
|
@@ -105,7 +105,7 @@ This is an [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NL
|
|
105 |
|
106 |
Whilst this model natively supports inputs up to 8192, all of our evaluations are on sentence level so there are no guarantees on it's longer text embedding capabilities of Historical Luxembourgish.
|
107 |
|
108 |
-
We also release a model that performs better (18pp) on ParaLUX. If finding monolingual exact matches within adversarial collections is of at-most importance, please use [histlux-paraphrase-multilingual-mpnet-base-v2](
|
109 |
|
110 |
### Model Description
|
111 |
- **Model Type:** GTE-Multilingual-Base
|
@@ -131,7 +131,7 @@ Then you can use the model like this:
|
|
131 |
from sentence_transformers import SentenceTransformer
|
132 |
sentences = ["This is an example sentence", "Each sentence is converted"]
|
133 |
|
134 |
-
model = SentenceTransformer('
|
135 |
embeddings = model.encode(sentences)
|
136 |
print(embeddings)
|
137 |
```
|
@@ -169,7 +169,7 @@ SIB-200(LB): 62.35
|
|
169 |
|
170 |
The parallel sentences data mix is the following:
|
171 |
|
172 |
-
|
173 |
- LB-FR (x20,000)
|
174 |
- LB-EN (x20,000)
|
175 |
- LB-DE (x20,000)
|
|
|
84 |
name: LB<->DE accuracy
|
85 |
license: agpl-3.0
|
86 |
datasets:
|
87 |
+
- impresso-project/HistLuxAlign
|
88 |
- fredxlpy/LuxAlign
|
89 |
language:
|
90 |
- lb
|
|
|
105 |
|
106 |
Whilst this model natively supports inputs up to 8192, all of our evaluations are on sentence level so there are no guarantees on it's longer text embedding capabilities of Historical Luxembourgish.
|
107 |
|
108 |
+
We also release a model that performs better (18pp) on ParaLUX. If finding monolingual exact matches within adversarial collections is of at-most importance, please use [histlux-paraphrase-multilingual-mpnet-base-v2](impresso-project/histlux-paraphrase-multilingual-mpnet-base-v2)
|
109 |
|
110 |
### Model Description
|
111 |
- **Model Type:** GTE-Multilingual-Base
|
|
|
131 |
from sentence_transformers import SentenceTransformer
|
132 |
sentences = ["This is an example sentence", "Each sentence is converted"]
|
133 |
|
134 |
+
model = SentenceTransformer('impresso-project/histlux-gte-multilingual-base', trust_remote_code=True)
|
135 |
embeddings = model.encode(sentences)
|
136 |
print(embeddings)
|
137 |
```
|
|
|
169 |
|
170 |
The parallel sentences data mix is the following:
|
171 |
|
172 |
+
impresso-project/HistLuxAlign:
|
173 |
- LB-FR (x20,000)
|
174 |
- LB-EN (x20,000)
|
175 |
- LB-DE (x20,000)
|