dleemiller commited on
Commit
e9abc08
·
verified ·
1 Parent(s): 287771a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -1
README.md CHANGED
@@ -46,4 +46,103 @@ model-index:
46
  - type: spearman_cosine
47
  value: 0.9087449124017827
48
  name: Spearman Cosine
49
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  - type: spearman_cosine
47
  value: 0.9087449124017827
48
  name: Spearman Cosine
49
+ ---
50
+ # NeoBERT Cross-Encoder: Semantic Similarity (STS)
51
+
52
+ Cross encoders are high performing encoder models that compare two texts and output a 0-1 score.
53
+ I've found the `cross-encoders/roberta-large-stsb` model to be very useful in creating evaluators for LLM outputs.
54
+ They're simple to use, fast and very accurate.
55
+
56
+ ---
57
+
58
+ ## Features
59
+ - **High performing:** Achieves **Pearson: 0.9124** and **Spearman: 0.9087** on the STS-Benchmark test set.
60
+ - **Efficient architecture:** Based on the NeoBERT design (250M parameters), offering faster inference speeds.
61
+ - **Extended context length:** Processes sequences up to 4096 tokens, great for LLM output evals.
62
+ - **Diversified training:** Pretrained on `dleemiller/wiki-sim` and fine-tuned on `sentence-transformers/stsb`.
63
+
64
+ ---
65
+
66
+ ## Performance
67
+
68
+ | Model | STS-B Test Pearson | STS-B Test Spearman | Context Length | Parameters | Speed |
69
+ |--------------------------------|--------------------|---------------------|----------------|------------|---------|
70
+ | `ModernCE-large-sts` | **0.9256** | **0.9215** | **8192** | 395M | **Medium** |
71
+ | `ModernCE-base-sts` | **0.9162** | **0.9122** | **8192** | 149M | **Fast** |
72
+ | `NeoCE-sts` | **0.9124** | **0.9087** | **4096** | 250M | **Fast** |
73
+ | `stsb-roberta-large` | 0.9147 | - | 512 | 355M | Slow |
74
+ | `stsb-distilroberta-base` | 0.8792 | - | 512 | 82M | Fast |
75
+
76
+
77
+ ---
78
+
79
+ ## Usage
80
+
81
+ To use NeoCE for semantic similarity tasks, you can load the model with the Hugging Face `sentence-transformers` library:
82
+
83
+ ```python
84
+ from sentence_transformers import CrossEncoder
85
+
86
+ # Load NeoCE model
87
+ model = CrossEncoder("dleemiller/NeoCE-sts")
88
+
89
+ # Predict similarity scores for sentence pairs
90
+ sentence_pairs = [
91
+ ("It's a wonderful day outside.", "It's so sunny today!"),
92
+ ("It's a wonderful day outside.", "He drove to work earlier."),
93
+ ]
94
+ scores = model.predict(sentence_pairs)
95
+
96
+ print(scores) # Outputs: array([0.9184, 0.0123], dtype=float32)
97
+ ```
98
+
99
+ ### Output
100
+ The model returns similarity scores in the range `[0, 1]`, where higher scores indicate stronger semantic similarity.
101
+
102
+ ---
103
+
104
+ ## Training Details
105
+
106
+ ### Pretraining
107
+ The model was pretrained on the `pair-score-sampled` subset of the [`dleemiller/wiki-sim`](https://huggingface.co/datasets/dleemiller/wiki-sim) dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences.
108
+ - **Classifier Dropout:** a somewhat large classifier dropout of 0.3, to reduce overreliance on teacher scores.
109
+ - **Objective:** STS-B scores from `cross-encoder/stsb-roberta-large`.
110
+
111
+ ### Fine-Tuning
112
+ Fine-tuning was performed on the [`sentence-transformers/stsb`](https://huggingface.co/datasets/sentence-transformers/stsb) dataset.
113
+
114
+ ---
115
+
116
+ ## Model Card
117
+
118
+ - **Architecture:** NeoBERT
119
+ - **Pretraining Data:** `dleemiller/wiki-sim (pair-score-sampled)`
120
+ - **Fine-Tuning Data:** `sentence-transformers/stsb`
121
+
122
+ ---
123
+
124
+ ## Thank You
125
+
126
+ Thanks to the chandra-lab team for providing the NeoBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
127
+
128
+ ---
129
+
130
+ ## Citation
131
+
132
+ If you use this model in your research, please cite:
133
+
134
+ ```bibtex
135
+ @misc{moderncestsb2025,
136
+ author = {Miller, D. Lee},
137
+ title = {NeoCE STS: An STS cross encoder model},
138
+ year = {2025},
139
+ publisher = {Hugging Face Hub},
140
+ url = {https://huggingface.co/dleemiller/ModernCE-base-sts},
141
+ }
142
+ ```
143
+
144
+ ---
145
+
146
+ ## License
147
+
148
+ This model is licensed under the [MIT License](LICENSE).