NeoCE-sts / README.md
dleemiller's picture
Update README.md
e9abc08 verified
metadata
license: mit
datasets:
  - dleemiller/wiki-sim
  - sentence-transformers/stsb
language:
  - en
metrics:
  - spearmanr
  - pearsonr
base_model:
  - chandar-lab/NeoBERT
pipeline_tag: text-classification
library_name: sentence-transformers
tags:
  - cross-encoder
  - neobert
  - stsb
  - stsbenchmark-sts
model-index:
  - name: CrossEncoder based on chandar-lab/NeoBERT
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.9208501169893029
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.9211827194606879
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.9123513299488885
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.9087449124017827
            name: Spearman Cosine

NeoBERT Cross-Encoder: Semantic Similarity (STS)

Cross encoders are high performing encoder models that compare two texts and output a 0-1 score. I've found the cross-encoders/roberta-large-stsb model to be very useful in creating evaluators for LLM outputs. They're simple to use, fast and very accurate.


Features

  • High performing: Achieves Pearson: 0.9124 and Spearman: 0.9087 on the STS-Benchmark test set.
  • Efficient architecture: Based on the NeoBERT design (250M parameters), offering faster inference speeds.
  • Extended context length: Processes sequences up to 4096 tokens, great for LLM output evals.
  • Diversified training: Pretrained on dleemiller/wiki-sim and fine-tuned on sentence-transformers/stsb.

Performance

Model STS-B Test Pearson STS-B Test Spearman Context Length Parameters Speed
ModernCE-large-sts 0.9256 0.9215 8192 395M Medium
ModernCE-base-sts 0.9162 0.9122 8192 149M Fast
NeoCE-sts 0.9124 0.9087 4096 250M Fast
stsb-roberta-large 0.9147 - 512 355M Slow
stsb-distilroberta-base 0.8792 - 512 82M Fast

Usage

To use NeoCE for semantic similarity tasks, you can load the model with the Hugging Face sentence-transformers library:

from sentence_transformers import CrossEncoder

# Load NeoCE model
model = CrossEncoder("dleemiller/NeoCE-sts")

# Predict similarity scores for sentence pairs
sentence_pairs = [
    ("It's a wonderful day outside.", "It's so sunny today!"),
    ("It's a wonderful day outside.", "He drove to work earlier."),
]
scores = model.predict(sentence_pairs)

print(scores)  # Outputs: array([0.9184, 0.0123], dtype=float32)

Output

The model returns similarity scores in the range [0, 1], where higher scores indicate stronger semantic similarity.


Training Details

Pretraining

The model was pretrained on the pair-score-sampled subset of the dleemiller/wiki-sim dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences.

  • Classifier Dropout: a somewhat large classifier dropout of 0.3, to reduce overreliance on teacher scores.
  • Objective: STS-B scores from cross-encoder/stsb-roberta-large.

Fine-Tuning

Fine-tuning was performed on the sentence-transformers/stsb dataset.


Model Card

  • Architecture: NeoBERT
  • Pretraining Data: dleemiller/wiki-sim (pair-score-sampled)
  • Fine-Tuning Data: sentence-transformers/stsb

Thank You

Thanks to the chandra-lab team for providing the NeoBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.


Citation

If you use this model in your research, please cite:

@misc{moderncestsb2025,
  author = {Miller, D. Lee},
  title = {NeoCE STS: An STS cross encoder model},
  year = {2025},
  publisher = {Hugging Face Hub},
  url = {https://huggingface.co/dleemiller/ModernCE-base-sts},
}

License

This model is licensed under the MIT License.