impresso-project
/

histlux-gte-multilingual-base

Sentence Similarity

sentence-transformers

dataset_size:120000

text-embeddings-inference

Model card Files Files and versions Community

Andrianos commited on 20 days ago

Commit

e75f912

·

verified ·

1 Parent(s): 1611849

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -84,7 +84,7 @@ model-index:
       name: LB<->DE accuracy
 license: agpl-3.0
 datasets:
-- Andrianos/HistLuxAlign
 - fredxlpy/LuxAlign
 language:
 - lb
@@ -105,7 +105,7 @@ This is an [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NL
 Whilst this model natively supports inputs up to 8192, all of our evaluations are on sentence level so there are no guarantees on it's longer text embedding capabilities of Historical Luxembourgish.
-We also release a model that performs better (18pp) on ParaLUX. If finding monolingual exact matches within adversarial collections is of at-most importance, please use [histlux-paraphrase-multilingual-mpnet-base-v2](Andrianos/histlux-paraphrase-multilingual-mpnet-base-v2)
 ### Model Description
 - **Model Type:** GTE-Multilingual-Base
@@ -131,7 +131,7 @@ Then you can use the model like this:
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
-model = SentenceTransformer('Andrianos/histlux-gte-multilingual-base', trust_remote_code=True)
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
@@ -169,7 +169,7 @@ SIB-200(LB): 62.35
 The parallel sentences data mix is the following:
-Andrianos/HistLuxAlign:
   - LB-FR (x20,000)
   - LB-EN (x20,000)
   - LB-DE (x20,000)

       name: LB<->DE accuracy
 license: agpl-3.0
 datasets:
+- impresso-project/HistLuxAlign
 - fredxlpy/LuxAlign
 language:
 - lb
 Whilst this model natively supports inputs up to 8192, all of our evaluations are on sentence level so there are no guarantees on it's longer text embedding capabilities of Historical Luxembourgish.
+We also release a model that performs better (18pp) on ParaLUX. If finding monolingual exact matches within adversarial collections is of at-most importance, please use [histlux-paraphrase-multilingual-mpnet-base-v2](impresso-project/histlux-paraphrase-multilingual-mpnet-base-v2)
 ### Model Description
 - **Model Type:** GTE-Multilingual-Base
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
+model = SentenceTransformer('impresso-project/histlux-gte-multilingual-base', trust_remote_code=True)
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
 The parallel sentences data mix is the following:
+impresso-project/HistLuxAlign:
   - LB-FR (x20,000)
   - LB-EN (x20,000)
   - LB-DE (x20,000)