bakrianoo's picture
Add new SentenceTransformer model
06bd202 verified
|
raw
history blame
21.4 kB
metadata
base_model: silma-ai/silma-embeddding-matryoshka-0.1
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:34436
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: Three men are playing chess.
    sentences:
      - Two men are fighting.
      - امرأة تحمل و تحمل طفل كنغر
      - Two men are playing chess.
  - source_sentence: Two men are playing chess.
    sentences:
      - رجل يعزف على الغيتار و يغني
      - Three men are playing chess.
      - طائرة طيران تقلع
  - source_sentence: Two men are playing chess.
    sentences:
      - A man is playing a large flute. رجل يعزف على ناي كبير
      - The man is playing the piano. الرجل يعزف على البيانو
      - Three men are playing chess.
  - source_sentence: الرجل يعزف على البيانو The man is playing the piano.
    sentences:
      - رجل يجلس ويلعب الكمان A man seated is playing the cello.
      - ثلاثة رجال يلعبون الشطرنج.
      - الرجل يعزف على الغيتار The man is playing the guitar.
  - source_sentence: الرجل ضرب الرجل الآخر بعصا The man hit the other man with a stick.
    sentences:
      - الرجل صفع الرجل الآخر بعصا The man spanked the other man with a stick.
      - A plane is taking off.
      - A man is smoking. رجل يدخن
model-index:
  - name: SentenceTransformer based on silma-ai/silma-embeddding-matryoshka-0.1
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 512
          type: sts-dev-512
        metrics:
          - type: pearson_cosine
            value: 0.8509127994264242
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8548500966032416
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.821303728669975
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8364598068079891
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8210450198328316
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8382181658285147
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8491261828772604
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8559811107036664
            name: Spearman Dot
          - type: pearson_max
            value: 0.8509127994264242
            name: Pearson Max
          - type: spearman_max
            value: 0.8559811107036664
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev 256
          type: sts-dev-256
        metrics:
          - type: pearson_cosine
            value: 0.8498025312190702
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8530609768738506
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8181745876468085
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8328727236454085
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8193792688284338
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8338632184708783
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8396368156921546
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8484397673758116
            name: Spearman Dot
          - type: pearson_max
            value: 0.8498025312190702
            name: Pearson Max
          - type: spearman_max
            value: 0.8530609768738506
            name: Spearman Max

SentenceTransformer based on silma-ai/silma-embeddding-matryoshka-0.1

This is a sentence-transformers model finetuned from silma-ai/silma-embeddding-matryoshka-0.1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("silma-ai/silma-embeddding-sts-0.1")
# Run inference
sentences = [
    'الرجل ضرب الرجل الآخر بعصا The man hit the other man with a stick.',
    'الرجل صفع الرجل الآخر بعصا The man spanked the other man with a stick.',
    'A man is smoking. رجل يدخن',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8509
spearman_cosine 0.8549
pearson_manhattan 0.8213
spearman_manhattan 0.8365
pearson_euclidean 0.821
spearman_euclidean 0.8382
pearson_dot 0.8491
spearman_dot 0.856
pearson_max 0.8509
spearman_max 0.856

Semantic Similarity

Metric Value
pearson_cosine 0.8498
spearman_cosine 0.8531
pearson_manhattan 0.8182
spearman_manhattan 0.8329
pearson_euclidean 0.8194
spearman_euclidean 0.8339
pearson_dot 0.8396
spearman_dot 0.8484
pearson_max 0.8498
spearman_max 0.8531

Training Details

Training Dataset

Unnamed Dataset

  • Size: 34,436 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 15.18 tokens
    • max: 42 tokens
    • min: 4 tokens
    • mean: 15.18 tokens
    • max: 42 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A woman picks up and holds a baby kangaroo in her arms. امرأة تحمل في ذراعها طفل كنغر A woman picks up and holds a baby kangaroo. امرأة تحمل و تحمل طفل كنغر 0.92
    امرأة تحمل و تحمل طفل كنغر A woman picks up and holds a baby kangaroo. امرأة تحمل في ذراعها طفل كنغر A woman picks up and holds a baby kangaroo in her arms. 0.92
    رجل يعزف على الناي رجل يعزف على فرقة الخيزران 0.77
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 100 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 100 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 15.96 tokens
    • max: 43 tokens
    • min: 4 tokens
    • mean: 15.96 tokens
    • max: 43 tokens
    • min: 0.1
    • mean: 0.72
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    طائرة ستقلع طائرة طيران تقلع 1.0
    طائرة طيران تقلع طائرة ستقلع 1.0
    A plane is taking off. An air plane is taking off. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 250
  • per_device_eval_batch_size: 10
  • learning_rate: 1e-06
  • num_train_epochs: 10
  • bf16: True
  • dataloader_drop_last: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 250
  • per_device_eval_batch_size: 10
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss sts-dev-512_spearman_cosine sts-dev-256_spearman_cosine
0.3650 50 0.0395 0.0424 0.8486 0.8487
0.7299 100 0.031 0.0427 0.8493 0.8495
1.0949 150 0.0344 0.0430 0.8496 0.8496
1.4599 200 0.0313 0.0427 0.8506 0.8504
1.8248 250 0.0267 0.0428 0.8504 0.8506
2.1898 300 0.0309 0.0429 0.8516 0.8515
2.5547 350 0.0276 0.0425 0.8531 0.8521
2.9197 400 0.028 0.0426 0.8530 0.8515
3.2847 450 0.0281 0.0425 0.8539 0.8521
3.6496 500 0.0248 0.0425 0.8542 0.8523
4.0146 550 0.0302 0.0424 0.8541 0.8520
4.3796 600 0.0261 0.0421 0.8545 0.8523
4.7445 650 0.0233 0.0420 0.8544 0.8522
5.1095 700 0.0281 0.0419 0.8547 0.8528
5.4745 750 0.0257 0.0419 0.8546 0.8531
5.8394 800 0.0235 0.0418 0.8546 0.8527
6.2044 850 0.0268 0.0418 0.8551 0.8529
6.5693 900 0.0238 0.0416 0.8552 0.8526
6.9343 950 0.0255 0.0416 0.8549 0.8526
7.2993 1000 0.0253 0.0416 0.8548 0.8528
7.6642 1050 0.0225 0.0415 0.8550 0.8525
8.0292 1100 0.0276 0.0414 0.8550 0.8528
8.3942 1150 0.0244 0.0415 0.8550 0.8533
8.7591 1200 0.0218 0.0414 0.8551 0.8529
9.1241 1250 0.0263 0.0414 0.8550 0.8531
9.4891 1300 0.0241 0.0414 0.8552 0.8533
9.8540 1350 0.0227 0.0415 0.8549 0.8531

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.2.0
  • Transformers: 4.45.2
  • PyTorch: 2.3.1
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}