CrossEncoder based on distilbert/distilroberta-base

This is a Cross Encoder model finetuned from distilbert/distilroberta-base on the stsb dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-distilroberta-base-stsb")
# Get scores for pairs...
pairs = [
    ['A man with a hard hat is dancing.', 'A man wearing a hard hat is dancing.'],
    ['A young child is riding a horse.', 'A child is riding a horse.'],
    ['A man is feeding a mouse to a snake.', 'The man is feeding a mouse to the snake.'],
    ['A woman is playing the guitar.', 'A man is playing guitar.'],
    ['A woman is playing the flute.', 'A man is playing a flute.'],
]
scores = model.predict(pairs)
print(scores.shape)
# [5]

# ... or rank different texts based on similarity to a single text
ranks = model.rank(
    'A man with a hard hat is dancing.',
    [
        'A man wearing a hard hat is dancing.',
        'A child is riding a horse.',
        'The man is feeding a mouse to the snake.',
        'A man is playing guitar.',
        'A man is playing a flute.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Correlation

Metric stsb-validation stsb-test
pearson 0.8773 0.8503
spearman 0.8754 0.8389

Training Details

Training Dataset

stsb

  • Dataset: stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 16 characters
    • mean: 31.92 characters
    • max: 113 characters
    • min: 16 characters
    • mean: 31.51 characters
    • max: 94 characters
    • min: 0.0
    • mean: 0.45
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: BinaryCrossEntropyLoss

Evaluation Dataset

stsb

  • Dataset: stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 12 characters
    • mean: 57.37 characters
    • max: 144 characters
    • min: 17 characters
    • mean: 56.84 characters
    • max: 141 characters
    • min: 0.0
    • mean: 0.42
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: BinaryCrossEntropyLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss stsb-validation_spearman stsb-test_spearman
-1 -1 - - -0.0150 -
0.2222 20 0.6905 - - -
0.4444 40 0.6548 - - -
0.6667 60 0.5906 - - -
0.8889 80 0.5631 0.5475 0.8589 -
1.1111 100 0.5517 - - -
1.3333 120 0.5473 - - -
1.5556 140 0.5454 - - -
1.7778 160 0.5402 0.5346 0.8760 -
2.0 180 0.542 - - -
2.2222 200 0.5229 - - -
2.4444 220 0.524 - - -
2.6667 240 0.5286 0.5373 0.8744 -
2.8889 260 0.5236 - - -
3.1111 280 0.5269 - - -
3.3333 300 0.5209 - - -
3.5556 320 0.5115 0.5409 0.8754 -
3.7778 340 0.5149 - - -
4.0 360 0.5084 - - -
-1 -1 - - - 0.8389

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.007 kWh
  • Carbon Emitted: 0.003 kg of CO2
  • Hours Used: 0.031 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.20.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
0
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the HF Inference API does not support sentence-transformers models with pipeline type text-classification

Model tree for tomaarsen/reranker-distilroberta-base-stsb

Finetuned
(588)
this model

Dataset used to train tomaarsen/reranker-distilroberta-base-stsb

Evaluation results