stsb-bert-tiny adapter finetuned on GooAQ pairs
This is a sentence-transformers model finetuned from sentence-transformers-testing/stsb-bert-tiny-safetensors on the gooaq dataset. It maps sentences & paragraphs to a 128-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
This model was trained using train_script.py.
Model Details
Model Description
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 128, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-lora")
sentences = [
"how to reverse a video on tiktok that's not yours?",
'[\'Tap "Effects" at the bottom of your screen — it\\\'s an icon that looks like a clock. Open the Effects menu. ... \', \'At the end of the new list that appears, tap "Time." Select "Time" at the end. ... \', \'Select "Reverse" — you\\\'ll then see a preview of your new, reversed video appear on the screen.\']',
'Relative age is the age of a rock layer (or the fossils it contains) compared to other layers. It can be determined by looking at the position of rock layers. Absolute age is the numeric age of a layer of rocks or fossils. Absolute age can be determined by using radiometric dating.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Information Retrieval
- Datasets:
NanoClimateFEVER, NanoDBPedia, NanoFEVER, NanoFiQA2018, NanoHotpotQA, NanoMSMARCO, NanoNFCorpus, NanoNQ, NanoQuoraRetrieval, NanoSCIDOCS, NanoArguAna, NanoSciFact and NanoTouche2020
- Evaluated with
InformationRetrievalEvaluator
| Metric |
NanoClimateFEVER |
NanoDBPedia |
NanoFEVER |
NanoFiQA2018 |
NanoHotpotQA |
NanoMSMARCO |
NanoNFCorpus |
NanoNQ |
NanoQuoraRetrieval |
NanoSCIDOCS |
NanoArguAna |
NanoSciFact |
NanoTouche2020 |
| cosine_accuracy@1 |
0.14 |
0.42 |
0.12 |
0.06 |
0.36 |
0.06 |
0.2 |
0.08 |
0.7 |
0.18 |
0.08 |
0.08 |
0.2041 |
| cosine_accuracy@3 |
0.22 |
0.62 |
0.18 |
0.1 |
0.52 |
0.26 |
0.26 |
0.18 |
0.82 |
0.26 |
0.26 |
0.22 |
0.5102 |
| cosine_accuracy@5 |
0.26 |
0.72 |
0.22 |
0.2 |
0.54 |
0.32 |
0.3 |
0.2 |
0.88 |
0.32 |
0.32 |
0.3 |
0.7551 |
| cosine_accuracy@10 |
0.38 |
0.86 |
0.36 |
0.28 |
0.62 |
0.36 |
0.44 |
0.42 |
0.94 |
0.4 |
0.4 |
0.32 |
0.8776 |
| cosine_precision@1 |
0.14 |
0.42 |
0.12 |
0.06 |
0.36 |
0.06 |
0.2 |
0.08 |
0.7 |
0.18 |
0.08 |
0.08 |
0.2041 |
| cosine_precision@3 |
0.08 |
0.34 |
0.06 |
0.04 |
0.2067 |
0.0867 |
0.12 |
0.06 |
0.32 |
0.12 |
0.0867 |
0.0733 |
0.2517 |
| cosine_precision@5 |
0.056 |
0.344 |
0.044 |
0.048 |
0.14 |
0.064 |
0.096 |
0.04 |
0.224 |
0.092 |
0.064 |
0.064 |
0.2531 |
| cosine_precision@10 |
0.05 |
0.29 |
0.036 |
0.032 |
0.078 |
0.036 |
0.08 |
0.042 |
0.118 |
0.066 |
0.04 |
0.034 |
0.2449 |
| cosine_recall@1 |
0.0567 |
0.0263 |
0.12 |
0.044 |
0.18 |
0.06 |
0.0038 |
0.08 |
0.624 |
0.036 |
0.08 |
0.08 |
0.0144 |
| cosine_recall@3 |
0.0867 |
0.0604 |
0.18 |
0.062 |
0.31 |
0.26 |
0.0073 |
0.17 |
0.772 |
0.0747 |
0.26 |
0.195 |
0.0488 |
| cosine_recall@5 |
0.1117 |
0.1027 |
0.22 |
0.1249 |
0.35 |
0.32 |
0.0127 |
0.19 |
0.866 |
0.0947 |
0.32 |
0.28 |
0.0793 |
| cosine_recall@10 |
0.1783 |
0.1961 |
0.34 |
0.1557 |
0.39 |
0.36 |
0.0193 |
0.4 |
0.8993 |
0.1347 |
0.4 |
0.3 |
0.1465 |
| cosine_ndcg@10 |
0.1412 |
0.3415 |
0.2122 |
0.104 |
0.3505 |
0.2142 |
0.0987 |
0.2052 |
0.7993 |
0.1348 |
0.2375 |
0.1937 |
0.2486 |
| cosine_mrr@10 |
0.1994 |
0.5504 |
0.1749 |
0.1082 |
0.4476 |
0.1667 |
0.2539 |
0.1507 |
0.7798 |
0.2421 |
0.1857 |
0.1647 |
0.4082 |
| cosine_map@100 |
0.1136 |
0.2113 |
0.1886 |
0.0804 |
0.2931 |
0.1916 |
0.0189 |
0.161 |
0.7635 |
0.1026 |
0.1985 |
0.1654 |
0.1638 |
Nano BEIR
| Metric |
Value |
| cosine_accuracy@1 |
0.2065 |
| cosine_accuracy@3 |
0.3392 |
| cosine_accuracy@5 |
0.4104 |
| cosine_accuracy@10 |
0.5121 |
| cosine_precision@1 |
0.2065 |
| cosine_precision@3 |
0.1419 |
| cosine_precision@5 |
0.1176 |
| cosine_precision@10 |
0.0882 |
| cosine_recall@1 |
0.1081 |
| cosine_recall@3 |
0.1913 |
| cosine_recall@5 |
0.2363 |
| cosine_recall@10 |
0.3015 |
| cosine_ndcg@10 |
0.2524 |
| cosine_mrr@10 |
0.2948 |
| cosine_map@100 |
0.204 |
Training Details
Training Dataset
gooaq
Evaluation Dataset
gooaq
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 1024
per_device_eval_batch_size: 1024
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 1024
per_device_eval_batch_size: 1024
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
NanoClimateFEVER_cosine_ndcg@10 |
NanoDBPedia_cosine_ndcg@10 |
NanoFEVER_cosine_ndcg@10 |
NanoFiQA2018_cosine_ndcg@10 |
NanoHotpotQA_cosine_ndcg@10 |
NanoMSMARCO_cosine_ndcg@10 |
NanoNFCorpus_cosine_ndcg@10 |
NanoNQ_cosine_ndcg@10 |
NanoQuoraRetrieval_cosine_ndcg@10 |
NanoSCIDOCS_cosine_ndcg@10 |
NanoArguAna_cosine_ndcg@10 |
NanoSciFact_cosine_ndcg@10 |
NanoTouche2020_cosine_ndcg@10 |
NanoBEIR_mean_cosine_ndcg@10 |
| 0 |
0 |
- |
- |
0.1174 |
0.3053 |
0.1405 |
0.0440 |
0.2821 |
0.2297 |
0.0773 |
0.1708 |
0.7830 |
0.1181 |
0.2017 |
0.1447 |
0.1642 |
0.2138 |
| 0.0010 |
1 |
3.6449 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.0256 |
25 |
3.6146 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.0512 |
50 |
3.6074 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.0768 |
75 |
3.5997 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1024 |
100 |
3.5737 |
2.0205 |
0.1178 |
0.3061 |
0.1477 |
0.0461 |
0.2837 |
0.2291 |
0.0804 |
0.1713 |
0.7791 |
0.1205 |
0.2049 |
0.1534 |
0.1731 |
0.2164 |
| 0.1279 |
125 |
3.5644 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1535 |
150 |
3.4792 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1791 |
175 |
3.4743 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2047 |
200 |
3.4169 |
1.9114 |
0.1336 |
0.3084 |
0.1446 |
0.0604 |
0.2965 |
0.2350 |
0.0847 |
0.1650 |
0.7806 |
0.1270 |
0.2141 |
0.1633 |
0.1835 |
0.2228 |
| 0.2303 |
225 |
3.3535 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2559 |
250 |
3.3336 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2815 |
275 |
3.3038 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3071 |
300 |
3.2576 |
1.8114 |
0.1359 |
0.3260 |
0.1733 |
0.0752 |
0.3167 |
0.2323 |
0.0851 |
0.1753 |
0.7843 |
0.1266 |
0.2218 |
0.1752 |
0.2012 |
0.2330 |
| 0.3327 |
325 |
3.2304 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3582 |
350 |
3.2133 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3838 |
375 |
3.1369 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4094 |
400 |
3.1412 |
1.7379 |
0.1389 |
0.3298 |
0.1930 |
0.0934 |
0.3261 |
0.2310 |
0.0852 |
0.1760 |
0.7850 |
0.1349 |
0.2235 |
0.1863 |
0.2118 |
0.2396 |
| 0.4350 |
425 |
3.0782 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4606 |
450 |
3.0948 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4862 |
475 |
3.0696 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5118 |
500 |
3.0641 |
1.6850 |
0.1373 |
0.3307 |
0.1945 |
0.0937 |
0.3301 |
0.2365 |
0.0931 |
0.1950 |
0.7933 |
0.1359 |
0.2231 |
0.1885 |
0.2289 |
0.2447 |
| 0.5374 |
525 |
3.0224 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5629 |
550 |
2.9927 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5885 |
575 |
2.9796 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6141 |
600 |
2.9624 |
1.6475 |
0.1397 |
0.3321 |
0.2058 |
0.0999 |
0.3422 |
0.2276 |
0.1014 |
0.1901 |
0.7971 |
0.1393 |
0.2258 |
0.1918 |
0.2342 |
0.2482 |
| 0.6397 |
625 |
2.9508 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6653 |
650 |
2.958 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6909 |
675 |
2.9428 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7165 |
700 |
2.9589 |
1.6209 |
0.1425 |
0.3344 |
0.2061 |
0.1050 |
0.3427 |
0.2295 |
0.1001 |
0.1868 |
0.7955 |
0.1342 |
0.2298 |
0.1922 |
0.2343 |
0.2487 |
| 0.7421 |
725 |
2.9152 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7677 |
750 |
2.9056 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7932 |
775 |
2.9111 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8188 |
800 |
2.9107 |
1.6037 |
0.1415 |
0.3401 |
0.2064 |
0.1053 |
0.3523 |
0.2153 |
0.1001 |
0.1934 |
0.7976 |
0.1340 |
0.2302 |
0.1946 |
0.2461 |
0.2505 |
| 0.8444 |
825 |
2.8675 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8700 |
850 |
2.9175 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8956 |
875 |
2.8592 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9212 |
900 |
2.86 |
1.5941 |
0.1411 |
0.3415 |
0.2180 |
0.1048 |
0.3506 |
0.2210 |
0.0987 |
0.2052 |
0.7988 |
0.1349 |
0.2302 |
0.1946 |
0.2464 |
0.2528 |
| 0.9468 |
925 |
2.8603 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9724 |
950 |
2.8909 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9980 |
975 |
2.8819 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 1.0 |
977 |
- |
- |
0.1412 |
0.3415 |
0.2122 |
0.1040 |
0.3505 |
0.2142 |
0.0987 |
0.2052 |
0.7993 |
0.1348 |
0.2375 |
0.1937 |
0.2486 |
0.2524 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.025 kWh
- Carbon Emitted: 0.010 kg of CO2
- Hours Used: 0.15 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 3.3.0.dev0
- Transformers: 4.46.2
- PyTorch: 2.5.0+cu121
- Accelerate: 1.0.0
- Datasets: 2.20.0
- Tokenizers: 0.20.3
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}