Andrianos commited on
Commit
1611849
·
verified ·
1 Parent(s): 583c9e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -94,10 +94,19 @@ language:
94
 
95
  This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) further adapted to support Historical and Contemporary Luxembourgish. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for (cross-lingual) semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
96
 
97
- ## Limitations
98
 
99
  ## Model Details
100
 
 
 
 
 
 
 
 
 
 
 
101
  ### Model Description
102
  - **Model Type:** GTE-Multilingual-Base
103
  - **Base model:** [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base)
 
94
 
95
  This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) further adapted to support Historical and Contemporary Luxembourgish. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for (cross-lingual) semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
96
 
 
97
 
98
  ## Model Details
99
 
100
+ This model is specialised to perform cross-lingual semantic search to and from Historical/Contemporary Luxembourgish. This model would be particularly useful for libraries and archives that want to perform semantic search and longitudinal studies within their collections.
101
+
102
+ This is an [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) model that was further adapted by (Michail et al., 2025)
103
+
104
+ ## Limitations
105
+
106
+ Whilst this model natively supports inputs up to 8192, all of our evaluations are on sentence level so there are no guarantees on it's longer text embedding capabilities of Historical Luxembourgish.
107
+
108
+ We also release a model that performs better (18pp) on ParaLUX. If finding monolingual exact matches within adversarial collections is of at-most importance, please use [histlux-paraphrase-multilingual-mpnet-base-v2](Andrianos/histlux-paraphrase-multilingual-mpnet-base-v2)
109
+
110
  ### Model Description
111
  - **Model Type:** GTE-Multilingual-Base
112
  - **Base model:** [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base)