metadata
language:
- en
license: apache-2.0
tags:
- biencoder
- sentence-transformers
- text-classification
- sentence-pair-classification
- semantic-similarity
- semantic-search
- retrieval
- reranking
- generated_from_trainer
- dataset_size:483820
- loss:MultipleNegativesSymmetricRankingLoss
base_model: Alibaba-NLP/gte-modernbert-base
widget:
- source_sentence: >-
See Precambrian time scale # Proposed Geologic timeline for another set of
periods 4600 -- 541 MYA .
sentences:
- >-
In 2014 election , Biju Janata Dal candidate Tathagat Satapathy
Bharatiya Janata party candidate Rudra Narayan Pany defeated with a
margin of 1.37,340 votes .
- >-
In Scotland , the Strathclyde Partnership for Transport , formerly known
as Strathclyde Passenger Transport Executive , comprises the former
Strathclyde region , which includes the urban area around Glasgow .
- >-
See Precambrian Time Scale # Proposed Geological Timeline for another
set of periods of 4600 -- 541 MYA .
- source_sentence: >-
It is also 5 kilometers northeast of Tamaqua , 27 miles south of Allentown
and 9 miles northwest of Hazleton .
sentences:
- In 1948 he moved to Massachusetts , and eventually settled in Vermont .
- >-
Suddenly I remembered that I was a New Zealander , I caught the first
plane home and came back .
- >-
It is also 5 miles northeast of Tamaqua , 27 miles south of Allentown ,
and 9 miles northwest of Hazleton .
- source_sentence: >-
The party has a Member of Parliament , a member of the House of Lords ,
three members of the London Assembly and two Members of the European
Parliament .
sentences:
- >-
The party has one Member of Parliament , one member of the House of
Lords , three Members of the London Assembly and two Members of the
European Parliament .
- >-
Grapsid crabs dominate in Australia , Malaysia and Panama , while
gastropods Cerithidea scalariformis and Melampus coeffeus are important
seed predators in Florida mangroves .
- >-
Music Story is a music service website and international music data
provider that curates , aggregates and analyses metadata for digital
music services .
- source_sentence: >-
The play received two 1969 Tony Award nominations : Best Actress in a Play
( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .
sentences:
- >-
Ravishanker is a fellow of the International Statistical Institute and
an elected member of the American Statistical Association .
- >-
In 1969 , the play received two Tony - Award nominations : Best Actress
in a Theatre Play ( Michael Annals ) and Best Costume Design ( Charlotte
Rae ) .
- >-
AMD and Nvidia both have proprietary methods of scaling , CrossFireX for
AMD , and SLI for Nvidia .
- source_sentence: >-
He was a close friend of Ángel Cabrera and is a cousin of golfer Tony
Croatto .
sentences:
- >-
He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony
Croatto .
- >-
Eugenijus Bartulis ( born December 7 , 1949 in Kaunas ) is a Lithuanian
Roman Catholic priest , and Bishop of Šiauliai .
- >-
UWIRE also distributes its members content to professional media outlets
, including Yahoo , CNN and CBS News .
datasets:
- redis/langcache-sentencepairs-v1
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- cosine_mcc
model-index:
- name: Redis fine-tuned BiEncoder model for semantic caching on LangCache
results:
- task:
type: binary-classification
name: Binary Classification
dataset:
name: test
type: test
metrics:
- type: cosine_accuracy
value: 0.7037777526966672
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.8524033427238464
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.7122170715871171
name: Cosine F1
- type: cosine_f1_threshold
value: 0.8118724822998047
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.5989283084033827
name: Cosine Precision
- type: cosine_recall
value: 0.8783612662942272
name: Cosine Recall
- type: cosine_ap
value: 0.6476665223951498
name: Cosine Ap
- type: cosine_mcc
value: 0.44182914870985407
name: Cosine Mcc
Redis fine-tuned BiEncoder model for semantic caching on LangCache
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base on the LangCache Sentence Pairs (all) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for sentence pair similarity.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Alibaba-NLP/gte-modernbert-base
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("redis/langcache-embed-v3")
# Run inference
sentences = [
'He was a close friend of Ángel Cabrera and is a cousin of golfer Tony Croatto .',
'He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto .',
'UWIRE also distributes its members content to professional media outlets , including Yahoo , CNN and CBS News .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[0.9922, 0.9922, 0.5352],
# [0.9922, 0.9961, 0.5391],
# [0.5352, 0.5391, 1.0000]], dtype=torch.bfloat16)
Evaluation
Metrics
Binary Classification
- Dataset:
test - Evaluated with
BinaryClassificationEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.7038 |
| cosine_accuracy_threshold | 0.8524 |
| cosine_f1 | 0.7122 |
| cosine_f1_threshold | 0.8119 |
| cosine_precision | 0.5989 |
| cosine_recall | 0.8784 |
| cosine_ap | 0.6477 |
| cosine_mcc | 0.4418 |
Training Details
Training Dataset
LangCache Sentence Pairs (all)
- Dataset: LangCache Sentence Pairs (all)
- Size: 26,850 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 8 tokens
- mean: 27.35 tokens
- max: 53 tokens
- min: 8 tokens
- mean: 27.27 tokens
- max: 52 tokens
- 1: 100.00%
- Samples:
sentence1 sentence2 label The newer Punts are still very much in existence today and race in the same fleets as the older boats .The newer punts are still very much in existence today and run in the same fleets as the older boats .1After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .1The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .1 - Loss:
MultipleNegativesSymmetricRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Evaluation Dataset
LangCache Sentence Pairs (all)
- Dataset: LangCache Sentence Pairs (all)
- Size: 26,850 evaluation samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 8 tokens
- mean: 27.35 tokens
- max: 53 tokens
- min: 8 tokens
- mean: 27.27 tokens
- max: 52 tokens
- 1: 100.00%
- Samples:
sentence1 sentence2 label The newer Punts are still very much in existence today and race in the same fleets as the older boats .The newer punts are still very much in existence today and run in the same fleets as the older boats .1After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .1The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .1 - Loss:
MultipleNegativesSymmetricRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Logs
| Epoch | Step | test_cosine_ap |
|---|---|---|
| -1 | -1 | 0.6477 |
Framework Versions
- Python: 3.12.3
- Sentence Transformers: 5.1.0
- Transformers: 4.56.0
- PyTorch: 2.8.0+cu128
- Accelerate: 1.10.1
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}