all-mpnet-base-v2-pair_score

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'jeremy hush book',
    'chinese jumper',
    'perfume',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0094 100 16.0755 -
0.0188 200 13.0643 -
0.0282 300 9.3474 -
0.0376 400 8.2606 -
0.0469 500 8.084 -
0.0563 600 8.0581 -
0.0657 700 8.0175 -
0.0751 800 8.0285 -
0.0845 900 8.0024 -
0.0939 1000 8.0161 -
0.1033 1100 7.9941 -
0.1127 1200 8.0233 -
0.1221 1300 8.0141 -
0.1314 1400 7.9644 -
0.1408 1500 8.0311 -
0.1502 1600 8.0306 -
0.1596 1700 7.989 -
0.1690 1800 8.0034 -
0.1784 1900 8.0107 -
0.1878 2000 7.9737 -
0.1972 2100 7.9827 -
0.2066 2200 8.0389 -
0.2159 2300 7.973 -
0.2253 2400 7.9669 -
0.2347 2500 8.0296 -
0.2441 2600 7.9984 -
0.2535 2700 7.9772 -
0.2629 2800 7.9838 -
0.2723 2900 7.9816 -
0.2817 3000 8.0021 -
0.2911 3100 7.9715 -
0.3004 3200 7.9809 -
0.3098 3300 7.9849 -
0.3192 3400 7.9463 -
0.3286 3500 8.0067 -
0.3380 3600 7.9431 -
0.3474 3700 7.9877 -
0.3568 3800 7.9494 -
0.3662 3900 7.9466 -
0.3756 4000 7.9708 -
0.3849 4100 7.9525 -
0.3943 4200 7.9322 -
0.4037 4300 7.9415 -
0.4131 4400 7.9932 -
0.4225 4500 7.9481 -
0.4319 4600 7.976 -
0.4413 4700 7.971 -
0.4507 4800 7.9647 -
0.4601 4900 7.9217 -
0.4694 5000 7.9374 7.9518
0.4788 5100 7.9026 -
0.4882 5200 7.9304 -
0.4976 5300 7.9148 -
0.5070 5400 7.9538 -
0.5164 5500 8.0002 -
0.5258 5600 7.9571 -
0.5352 5700 7.932 -
0.5445 5800 7.9047 -
0.5539 5900 7.9353 -
0.5633 6000 7.9203 -
0.5727 6100 7.8967 -
0.5821 6200 7.9414 -
0.5915 6300 7.9631 -
0.6009 6400 7.9606 -
0.6103 6500 7.9377 -
0.6197 6600 7.9108 -
0.6290 6700 7.9225 -
0.6384 6800 7.9154 -
0.6478 6900 7.9191 -
0.6572 7000 7.8903 -
0.6666 7100 7.9213 -
0.6760 7200 7.9202 -
0.6854 7300 7.8998 -
0.6948 7400 7.9153 -
0.7042 7500 7.9037 -
0.7135 7600 7.9146 -
0.7229 7700 7.8972 -
0.7323 7800 7.9374 -
0.7417 7900 7.8647 -
0.7511 8000 7.8915 -
0.7605 8100 7.8846 -
0.7699 8200 7.8988 -
0.7793 8300 7.8702 -
0.7887 8400 7.923 -
0.7980 8500 7.891 -
0.8074 8600 7.8832 -
0.8168 8700 7.8726 -
0.8262 8800 7.8813 -
0.8356 8900 7.8986 -
0.8450 9000 7.8743 -
0.8544 9100 7.8791 -
0.8638 9200 7.8783 -
0.8732 9300 7.8528 -
0.8825 9400 7.8864 -
0.8919 9500 7.8989 -
0.9013 9600 7.8617 -
0.9107 9700 7.8371 -
0.9201 9800 7.8566 -
0.9295 9900 7.8776 -
0.9389 10000 7.8558 7.8492
0.9483 10100 7.848 -
0.9577 10200 7.8227 -
0.9670 10300 7.8311 -
0.9764 10400 7.8437 -
0.9858 10500 7.8454 -
0.9952 10600 7.8362 -
1.0046 10700 7.8681 -
1.0140 10800 7.8745 -
1.0234 10900 7.8339 -
1.0328 11000 7.8458 -
1.0422 11100 7.8493 -
1.0515 11200 7.8317 -
1.0609 11300 7.841 -
1.0703 11400 7.8292 -
1.0797 11500 7.8121 -
1.0891 11600 7.8165 -
1.0985 11700 7.8259 -
1.1079 11800 7.8303 -
1.1173 11900 7.809 -
1.1267 12000 7.818 -
1.1360 12100 7.8071 -
1.1454 12200 7.801 -
1.1548 12300 7.8123 -
1.1642 12400 7.8203 -
1.1736 12500 7.8609 -
1.1830 12600 7.7782 -
1.1924 12700 7.8092 -
1.2018 12800 7.815 -
1.2112 12900 7.8196 -
1.2205 13000 7.8206 -
1.2299 13100 7.8022 -
1.2393 13200 7.8043 -
1.2487 13300 7.7823 -
1.2581 13400 7.8061 -
1.2675 13500 7.8016 -
1.2769 13600 7.8076 -
1.2863 13700 7.7996 -
1.2957 13800 7.8035 -
1.3050 13900 7.8092 -
1.3144 14000 7.7902 -
1.3238 14100 7.8114 -
1.3332 14200 7.8112 -
1.3426 14300 7.8036 -
1.3520 14400 7.8178 -
1.3614 14500 7.8391 -
1.3708 14600 7.8151 -
1.3802 14700 7.7957 -
1.3895 14800 7.7833 -
1.3989 14900 7.8049 -
1.4083 15000 7.8163 7.8078
1.4177 15100 7.7864 -
1.4271 15200 7.8241 -
1.4365 15300 7.7694 -
1.4459 15400 7.7784 -
1.4553 15500 7.7628 -
1.4647 15600 7.8044 -
1.4740 15700 7.7871 -
1.4834 15800 7.809 -
1.4928 15900 7.7955 -
1.5022 16000 7.8056 -
1.5116 16100 7.774 -
1.5210 16200 7.7874 -
1.5304 16300 7.7918 -
1.5398 16400 7.7787 -
1.5492 16500 7.7881 -
1.5585 16600 7.7723 -
1.5679 16700 7.7809 -
1.5773 16800 7.8096 -
1.5867 16900 7.7559 -
1.5961 17000 7.8063 -
1.6055 17100 7.8137 -
1.6149 17200 7.761 -
1.6243 17300 7.7672 -
1.6336 17400 7.7939 -
1.6430 17500 7.8052 -
1.6524 17600 7.7519 -
1.6618 17700 7.7643 -
1.6712 17800 7.7823 -
1.6806 17900 7.7507 -
1.6900 18000 7.777 -
1.6994 18100 7.786 -
1.7088 18200 7.8097 -
1.7181 18300 7.7749 -
1.7275 18400 7.7626 -
1.7369 18500 7.7783 -
1.7463 18600 7.7552 -
1.7557 18700 7.7837 -
1.7651 18800 7.7583 -
1.7745 18900 7.7617 -
1.7839 19000 7.7649 -
1.7933 19100 7.7767 -
1.8026 19200 7.7565 -
1.8120 19300 7.7702 -
1.8214 19400 7.7552 -
1.8308 19500 7.7511 -
1.8402 19600 7.7818 -
1.8496 19700 7.7704 -
1.8590 19800 7.7824 -
1.8684 19900 7.751 -
1.8778 20000 7.7868 7.7942
1.8871 20100 7.7981 -
1.8965 20200 7.7673 -
1.9059 20300 7.7695 -
1.9153 20400 7.7587 -
1.9247 20500 7.7444 -
1.9341 20600 7.7736 -
1.9435 20700 7.7655 -
1.9529 20800 7.7686 -
1.9623 20900 7.7731 -
1.9716 21000 7.7527 -
1.9810 21100 7.7962 -
1.9904 21200 7.7676 -
1.9998 21300 7.7641 -

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
8
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for youssefkhalil320/all-mpnet-base-v2-pairscore

Finetuned
(216)
this model