File size: 4,706 Bytes
9adcfb4 88cc04f 287771a 88cc04f 287771a 88cc04f 287771a 88cc04f 287771a ab51508 e9abc08 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
license: mit
datasets:
- dleemiller/wiki-sim
- sentence-transformers/stsb
language:
- en
metrics:
- spearmanr
- pearsonr
base_model:
- chandar-lab/NeoBERT
pipeline_tag: text-classification
library_name: sentence-transformers
tags:
- cross-encoder
- neobert
- stsb
- stsbenchmark-sts
model-index:
- name: CrossEncoder based on chandar-lab/NeoBERT
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.9208501169893029
name: Pearson Cosine
- type: spearman_cosine
value: 0.9211827194606879
name: Spearman Cosine
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.9123513299488885
name: Pearson Cosine
- type: spearman_cosine
value: 0.9087449124017827
name: Spearman Cosine
---
# NeoBERT Cross-Encoder: Semantic Similarity (STS)
Cross encoders are high performing encoder models that compare two texts and output a 0-1 score.
I've found the `cross-encoders/roberta-large-stsb` model to be very useful in creating evaluators for LLM outputs.
They're simple to use, fast and very accurate.
---
## Features
- **High performing:** Achieves **Pearson: 0.9124** and **Spearman: 0.9087** on the STS-Benchmark test set.
- **Efficient architecture:** Based on the NeoBERT design (250M parameters), offering faster inference speeds.
- **Extended context length:** Processes sequences up to 4096 tokens, great for LLM output evals.
- **Diversified training:** Pretrained on `dleemiller/wiki-sim` and fine-tuned on `sentence-transformers/stsb`.
---
## Performance
| Model | STS-B Test Pearson | STS-B Test Spearman | Context Length | Parameters | Speed |
|--------------------------------|--------------------|---------------------|----------------|------------|---------|
| `ModernCE-large-sts` | **0.9256** | **0.9215** | **8192** | 395M | **Medium** |
| `ModernCE-base-sts` | **0.9162** | **0.9122** | **8192** | 149M | **Fast** |
| `NeoCE-sts` | **0.9124** | **0.9087** | **4096** | 250M | **Fast** |
| `stsb-roberta-large` | 0.9147 | - | 512 | 355M | Slow |
| `stsb-distilroberta-base` | 0.8792 | - | 512 | 82M | Fast |
---
## Usage
To use NeoCE for semantic similarity tasks, you can load the model with the Hugging Face `sentence-transformers` library:
```python
from sentence_transformers import CrossEncoder
# Load NeoCE model
model = CrossEncoder("dleemiller/NeoCE-sts")
# Predict similarity scores for sentence pairs
sentence_pairs = [
("It's a wonderful day outside.", "It's so sunny today!"),
("It's a wonderful day outside.", "He drove to work earlier."),
]
scores = model.predict(sentence_pairs)
print(scores) # Outputs: array([0.9184, 0.0123], dtype=float32)
```
### Output
The model returns similarity scores in the range `[0, 1]`, where higher scores indicate stronger semantic similarity.
---
## Training Details
### Pretraining
The model was pretrained on the `pair-score-sampled` subset of the [`dleemiller/wiki-sim`](https://huggingface.co/datasets/dleemiller/wiki-sim) dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences.
- **Classifier Dropout:** a somewhat large classifier dropout of 0.3, to reduce overreliance on teacher scores.
- **Objective:** STS-B scores from `cross-encoder/stsb-roberta-large`.
### Fine-Tuning
Fine-tuning was performed on the [`sentence-transformers/stsb`](https://huggingface.co/datasets/sentence-transformers/stsb) dataset.
---
## Model Card
- **Architecture:** NeoBERT
- **Pretraining Data:** `dleemiller/wiki-sim (pair-score-sampled)`
- **Fine-Tuning Data:** `sentence-transformers/stsb`
---
## Thank You
Thanks to the chandra-lab team for providing the NeoBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
---
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{moderncestsb2025,
author = {Miller, D. Lee},
title = {NeoCE STS: An STS cross encoder model},
year = {2025},
publisher = {Hugging Face Hub},
url = {https://huggingface.co/dleemiller/ModernCE-base-sts},
}
```
---
## License
This model is licensed under the [MIT License](LICENSE). |