SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("omega5505/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'Why do so many religious people believe in healing miracles?',
    'Is believing in God a bad thing?',
    'What do you like about China?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.877
cosine_accuracy_threshold 0.7857
cosine_f1 0.8516
cosine_f1_threshold 0.7746
cosine_precision 0.8209
cosine_recall 0.8847
cosine_ap 0.8988
cosine_mcc 0.7484

Paraphrase Mining

Metric Value
average_precision 0.5483
f1 0.5606
precision 0.5539
recall 0.5675
threshold 0.8632

Information Retrieval

Metric Value
cosine_accuracy@1 0.9308
cosine_accuracy@3 0.969
cosine_accuracy@5 0.9778
cosine_accuracy@10 0.9854
cosine_precision@1 0.9308
cosine_precision@3 0.4145
cosine_precision@5 0.267
cosine_precision@10 0.1414
cosine_recall@1 0.8009
cosine_recall@3 0.9314
cosine_recall@5 0.9558
cosine_recall@10 0.9744
cosine_ndcg@10 0.9511
cosine_mrr@10 0.9512
cosine_map@100 0.9391

Training Details

Training Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.73 tokens
    • max: 65 tokens
    • min: 6 tokens
    • mean: 15.93 tokens
    • max: 85 tokens
    • 0: ~61.60%
    • 1: ~38.40%
  • Samples:
    sentence1 sentence2 label
    How can Trump supporters claim he didn't mock a disabled reporter when there is live footage of him mocking a disabled reporter? Why don't people actually watch the Trump video of him allegedly mocking a disabled reporter? 0
    Where can I get the best digital marketing course (online & offline) in India? Which is the best digital marketing institute for professionals in India? 1
    What best two liner shayri? What does "senile dementia, uncomplicated" mean in medical terms? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.14 tokens
    • max: 70 tokens
    • min: 6 tokens
    • mean: 15.92 tokens
    • max: 74 tokens
    • 0: ~60.10%
    • 1: ~39.90%
  • Samples:
    sentence1 sentence2 label
    What are some must subscribe RSS feeds? What are RSS feeds? 0
    How close are Madonna and Hillary Clinton? Why do people say Hillary Clinton is a crook? 0
    Can you share best day of your life? What is the Best Day of your life till date? 1
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss quora-duplicates_cosine_ap quora-duplicates-dev_average_precision cosine_ndcg@10
0 0 - - 0.7458 0.4200 0.9390
0.0640 100 2.5263 - - - -
0.1280 200 2.1489 - - - -
0.1599 250 - 1.8621 0.8433 0.3907 0.9329
0.1919 300 2.0353 - - - -
0.2559 400 1.7831 - - - -
0.3199 500 1.8887 1.7744 0.8662 0.4924 0.9379
0.3839 600 1.7814 - - - -
0.4479 700 1.7775 - - - -
0.4798 750 - 1.6468 0.8766 0.4945 0.9399
0.5118 800 1.6835 - - - -
0.5758 900 1.6974 - - - -
0.6398 1000 1.5704 1.4925 0.8895 0.5283 0.9460
0.7038 1100 1.6771 - - - -
0.7678 1200 1.619 - - - -
0.7997 1250 - 1.4311 0.8982 0.5252 0.9466
0.8317 1300 1.6119 - - - -
0.8957 1400 1.6043 - - - -
0.9597 1500 1.6848 1.4070 0.8988 0.5483 0.9511

Framework Versions

  • Python: 3.9.18
  • Sentence Transformers: 3.4.1
  • Transformers: 4.44.2
  • PyTorch: 2.2.1+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.19.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
0
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for omega5505/stsb-distilbert-base-ocl

Finetuned
(7)
this model

Dataset used to train omega5505/stsb-distilbert-base-ocl

Evaluation results