metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:498670
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Alibaba-NLP/gte-multilingual-base
widget:
- source_sentence: كم يبغ عدد السكان في المملكة المتحدة؟
sentences:
- هناك العديد من الناس الحاضرين.
- كم عدد سكان أوكرانيا؟
- لماذا باراك أوباما غير مؤهل للترشح في انتخابات الرئاسة لعام 2016؟
- source_sentence: ماذا يجب أن أعرف عن ممارسة الأعمال التجارية في بلدك كرائد أعمال؟
sentences:
- >-
إذا كان بإمكانك العيش في أي مكان في العالم لمدة عام، أين سيكون ذلك
ولماذا؟
- ماذا يجب أن أعطي صديقي في عيد الميلاد؟
- ماذا يجب أن أعرف عن ممارسة الأعمال التجارية في بلدك؟
- source_sentence: الرجل يرسم
sentences:
- رجل يستخدم الطلاء الرذاذ لرسم صورة على الحائط.
- العرض مقرّر غداً.
- مساء من الترفيه تحت النجوم هو أساسا جنوب كاليفورنيا.
- source_sentence: لماذا لا يزال دونالد ترامب "يتجنب" قضية إقرار ضريبة الدخل؟
sentences:
- الحديقة لديها بوابة
- >-
لماذا لا يبدأ ترامب في قول "الحقيقة" عن طريق الإفصاح عن إقراراته
الضريبية؟
- كيف يمكنني التحقق من حسابي على إنستغرام مع علامة زرقاء؟
- source_sentence: لا أعتقد ذلك
sentences:
- رجل واحد في قميص برتقالي يرتدي خوذة بيضاء يركب دراجة.
- هناك أشخاص يأكلون في مطعم.
- أخشى لا يا سيدي
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: arabic sts17
type: arabic-sts17
metrics:
- type: pearson_cosine
value: 0.8112776989727821
name: Pearson Cosine
- type: spearman_cosine
value: 0.8156442694344616
name: Spearman Cosine
SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-multilingual-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Alibaba-NLP/gte-multilingual-base
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'لا أعتقد ذلك',
'أخشى لا يا سيدي',
'رجل واحد في قميص برتقالي يرتدي خوذة بيضاء يركب دراجة.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
arabic-sts17
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.8113 |
spearman_cosine | 0.8156 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 498,670 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 4 tokens
- mean: 19.59 tokens
- max: 82 tokens
- min: 4 tokens
- mean: 13.98 tokens
- max: 69 tokens
- Samples:
sentence_0 sentence_1 ولد صغير يرتدي ملابس زرقاء يرتدي حذاء
الصبي الصغير يرتدي ملابسه
كيف يتم بناء كاميرات المراقبة؟
ما هي كاميرا المراقبة؟
لماذا الطاقة الإجمالية للكون صفر؟
إذا كان إجمالي الطاقة في الكون صفر، فهل يعني ذلك أن هناك طريقة لـ "صنع" المادة/الطاقة من خلال صنع نوع من النظير؟
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 384, 128 ], "matryoshka_weights": [ 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 24per_device_eval_batch_size
: 24fp16
: Truemulti_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 24per_device_eval_batch_size
: 24per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size
: 0fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss | arabic-sts17_spearman_cosine |
---|---|---|---|
0.0481 | 500 | 1.6592 | - |
0.0963 | 1000 | 1.177 | - |
0.1444 | 1500 | 1.0053 | - |
0.1925 | 2000 | 0.9125 | 0.8135 |
0.2406 | 2500 | 0.8212 | - |
0.2888 | 3000 | 0.8204 | - |
0.3369 | 3500 | 0.7696 | - |
0.3850 | 4000 | 0.7501 | 0.8089 |
0.4332 | 4500 | 0.7118 | - |
0.4813 | 5000 | 0.7073 | - |
0.5294 | 5500 | 0.6772 | - |
0.5775 | 6000 | 0.6637 | 0.8085 |
0.6257 | 6500 | 0.6507 | - |
0.6738 | 7000 | 0.605 | - |
0.7219 | 7500 | 0.6076 | - |
0.7700 | 8000 | 0.6076 | 0.8060 |
0.8182 | 8500 | 0.5594 | - |
0.8663 | 9000 | 0.5928 | - |
0.9144 | 9500 | 0.5587 | - |
0.9626 | 10000 | 0.5736 | 0.8099 |
1.0 | 10389 | - | 0.8122 |
1.0107 | 10500 | 0.555 | - |
1.0588 | 11000 | 0.5233 | - |
1.1069 | 11500 | 0.5216 | - |
1.1551 | 12000 | 0.5176 | 0.8015 |
1.2032 | 12500 | 0.4865 | - |
1.2513 | 13000 | 0.4907 | - |
1.2995 | 13500 | 0.5079 | - |
1.3476 | 14000 | 0.4991 | 0.8027 |
1.3957 | 14500 | 0.4834 | - |
1.4438 | 15000 | 0.4626 | - |
1.4920 | 15500 | 0.4442 | - |
1.5401 | 16000 | 0.4768 | 0.8079 |
1.5882 | 16500 | 0.4459 | - |
1.6363 | 17000 | 0.4409 | - |
1.6845 | 17500 | 0.4434 | - |
1.7326 | 18000 | 0.4264 | 0.8041 |
1.7807 | 18500 | 0.4341 | - |
1.8289 | 19000 | 0.4143 | - |
1.8770 | 19500 | 0.4304 | - |
1.9251 | 20000 | 0.4314 | 0.8133 |
1.9732 | 20500 | 0.448 | - |
2.0 | 20778 | - | 0.8116 |
2.0214 | 21000 | 0.3985 | - |
2.0695 | 21500 | 0.3854 | - |
2.1176 | 22000 | 0.3875 | 0.8095 |
2.1658 | 22500 | 0.4139 | - |
2.2139 | 23000 | 0.3956 | - |
2.2620 | 23500 | 0.3856 | - |
2.3101 | 24000 | 0.3816 | 0.8110 |
2.3583 | 24500 | 0.3732 | - |
2.4064 | 25000 | 0.3662 | - |
2.4545 | 25500 | 0.3773 | - |
2.5026 | 26000 | 0.3703 | 0.8058 |
2.5508 | 26500 | 0.3666 | - |
2.5989 | 27000 | 0.369 | - |
2.6470 | 27500 | 0.3612 | - |
2.6952 | 28000 | 0.3444 | 0.8135 |
2.7433 | 28500 | 0.3667 | - |
2.7914 | 29000 | 0.3707 | - |
2.8395 | 29500 | 0.3698 | - |
2.8877 | 30000 | 0.3658 | 0.8156 |
Framework Versions
- Python: 3.12.7
- Sentence Transformers: 3.3.1
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.4.0
- Datasets: 3.3.2
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}