akhooli/arabic-colbertv2-250k-norm
This Arabic ColBERT model is reasonably, but not fully, trained on 250k normalized queries sampled from the Arabic mMARCO dataset.
Training parameters are in the metadata file.
See https://www.linkedin.com/posts/akhooli_arabic-bert-tokenizers-you-may-need-to-normalize-activity-7225747473523216384-D1oH
Please note that there is another model trained (partially) on normalized 711k
dataset: akhooli/arabic-colbertv2-711k-norm.
This model should be good for ranking and retrieval but not for critical tasks. A demo example using it is the Quran Semantic Search. If you downloaded it before Aug. 6, 2024, you are advised to refresh your copy.
You need to normalize your query and document(s) for better results:
from unicodedata import normalize
query_n = normalize('NFKC', query)
- Downloads last month
- 10
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.