Andrianos commited on
Commit
e75f912
·
verified ·
1 Parent(s): 1611849

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -84,7 +84,7 @@ model-index:
84
  name: LB<->DE accuracy
85
  license: agpl-3.0
86
  datasets:
87
- - Andrianos/HistLuxAlign
88
  - fredxlpy/LuxAlign
89
  language:
90
  - lb
@@ -105,7 +105,7 @@ This is an [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NL
105
 
106
  Whilst this model natively supports inputs up to 8192, all of our evaluations are on sentence level so there are no guarantees on it's longer text embedding capabilities of Historical Luxembourgish.
107
 
108
- We also release a model that performs better (18pp) on ParaLUX. If finding monolingual exact matches within adversarial collections is of at-most importance, please use [histlux-paraphrase-multilingual-mpnet-base-v2](Andrianos/histlux-paraphrase-multilingual-mpnet-base-v2)
109
 
110
  ### Model Description
111
  - **Model Type:** GTE-Multilingual-Base
@@ -131,7 +131,7 @@ Then you can use the model like this:
131
  from sentence_transformers import SentenceTransformer
132
  sentences = ["This is an example sentence", "Each sentence is converted"]
133
 
134
- model = SentenceTransformer('Andrianos/histlux-gte-multilingual-base', trust_remote_code=True)
135
  embeddings = model.encode(sentences)
136
  print(embeddings)
137
  ```
@@ -169,7 +169,7 @@ SIB-200(LB): 62.35
169
 
170
  The parallel sentences data mix is the following:
171
 
172
- Andrianos/HistLuxAlign:
173
  - LB-FR (x20,000)
174
  - LB-EN (x20,000)
175
  - LB-DE (x20,000)
 
84
  name: LB<->DE accuracy
85
  license: agpl-3.0
86
  datasets:
87
+ - impresso-project/HistLuxAlign
88
  - fredxlpy/LuxAlign
89
  language:
90
  - lb
 
105
 
106
  Whilst this model natively supports inputs up to 8192, all of our evaluations are on sentence level so there are no guarantees on it's longer text embedding capabilities of Historical Luxembourgish.
107
 
108
+ We also release a model that performs better (18pp) on ParaLUX. If finding monolingual exact matches within adversarial collections is of at-most importance, please use [histlux-paraphrase-multilingual-mpnet-base-v2](impresso-project/histlux-paraphrase-multilingual-mpnet-base-v2)
109
 
110
  ### Model Description
111
  - **Model Type:** GTE-Multilingual-Base
 
131
  from sentence_transformers import SentenceTransformer
132
  sentences = ["This is an example sentence", "Each sentence is converted"]
133
 
134
+ model = SentenceTransformer('impresso-project/histlux-gte-multilingual-base', trust_remote_code=True)
135
  embeddings = model.encode(sentences)
136
  print(embeddings)
137
  ```
 
169
 
170
  The parallel sentences data mix is the following:
171
 
172
+ impresso-project/HistLuxAlign:
173
  - LB-FR (x20,000)
174
  - LB-EN (x20,000)
175
  - LB-DE (x20,000)