tomaarsen's picture
tomaarsen HF staff
Add new CrossEncoder model
a5163d9 verified
metadata
language:
  - en
tags:
  - sentence-transformers
  - cross-encoder
  - text-classification
  - generated_from_trainer
  - dataset_size:5749
  - loss:BinaryCrossEntropyLoss
base_model: distilbert/distilroberta-base
datasets:
  - sentence-transformers/stsb
pipeline_tag: text-classification
library_name: sentence-transformers
metrics:
  - pearson
  - spearman
co2_eq_emissions:
  emissions: 2.6550346776830636
  energy_consumed: 0.006830514578476734
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.031
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: CrossEncoder based on distilbert/distilroberta-base
    results:
      - task:
          type: cross-encoder-correlation
          name: Cross Encoder Correlation
        dataset:
          name: stsb validation
          type: stsb-validation
        metrics:
          - type: pearson
            value: 0.877295960646044
            name: Pearson
          - type: spearman
            value: 0.8754151440157509
            name: Spearman
      - task:
          type: cross-encoder-correlation
          name: Cross Encoder Correlation
        dataset:
          name: stsb test
          type: stsb-test
        metrics:
          - type: pearson
            value: 0.8503341584157813
            name: Pearson
          - type: spearman
            value: 0.8388642249054395
            name: Spearman

CrossEncoder based on distilbert/distilroberta-base

This is a Cross Encoder model finetuned from distilbert/distilroberta-base on the stsb dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-distilroberta-base-stsb")
# Get scores for pairs...
pairs = [
    ['A man with a hard hat is dancing.', 'A man wearing a hard hat is dancing.'],
    ['A young child is riding a horse.', 'A child is riding a horse.'],
    ['A man is feeding a mouse to a snake.', 'The man is feeding a mouse to the snake.'],
    ['A woman is playing the guitar.', 'A man is playing guitar.'],
    ['A woman is playing the flute.', 'A man is playing a flute.'],
]
scores = model.predict(pairs)
print(scores.shape)
# [5]

# ... or rank different texts based on similarity to a single text
ranks = model.rank(
    'A man with a hard hat is dancing.',
    [
        'A man wearing a hard hat is dancing.',
        'A child is riding a horse.',
        'The man is feeding a mouse to the snake.',
        'A man is playing guitar.',
        'A man is playing a flute.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Correlation

Metric stsb-validation stsb-test
pearson 0.8773 0.8503
spearman 0.8754 0.8389

Training Details

Training Dataset

stsb

  • Dataset: stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 16 characters
    • mean: 31.92 characters
    • max: 113 characters
    • min: 16 characters
    • mean: 31.51 characters
    • max: 94 characters
    • min: 0.0
    • mean: 0.45
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: BinaryCrossEntropyLoss

Evaluation Dataset

stsb

  • Dataset: stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 12 characters
    • mean: 57.37 characters
    • max: 144 characters
    • min: 17 characters
    • mean: 56.84 characters
    • max: 141 characters
    • min: 0.0
    • mean: 0.42
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: BinaryCrossEntropyLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss stsb-validation_spearman stsb-test_spearman
-1 -1 - - -0.0150 -
0.2222 20 0.6905 - - -
0.4444 40 0.6548 - - -
0.6667 60 0.5906 - - -
0.8889 80 0.5631 0.5475 0.8589 -
1.1111 100 0.5517 - - -
1.3333 120 0.5473 - - -
1.5556 140 0.5454 - - -
1.7778 160 0.5402 0.5346 0.8760 -
2.0 180 0.542 - - -
2.2222 200 0.5229 - - -
2.4444 220 0.524 - - -
2.6667 240 0.5286 0.5373 0.8744 -
2.8889 260 0.5236 - - -
3.1111 280 0.5269 - - -
3.3333 300 0.5209 - - -
3.5556 320 0.5115 0.5409 0.8754 -
3.7778 340 0.5149 - - -
4.0 360 0.5084 - - -
-1 -1 - - - 0.8389

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.007 kWh
  • Carbon Emitted: 0.003 kg of CO2
  • Hours Used: 0.031 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.20.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}