adjust readme
Browse files- README.md +3 -2
- semscore.py +1 -2
README.md
CHANGED
@@ -20,7 +20,6 @@ When loading SemScore, you can choose any pre-trained encoder-only model uploade
|
|
20 |
|
21 |
```python
|
22 |
import evaluate
|
23 |
-
|
24 |
semscore = evaluate.load("semscore", "model_name")
|
25 |
```
|
26 |
|
@@ -40,7 +39,6 @@ Its optional arguments are:
|
|
40 |
```python
|
41 |
predictions = ['This is an example sentence', 'Each sentence is considered']
|
42 |
references = ['This is an example sentence', 'Each sentence is considered']
|
43 |
-
|
44 |
results = semscore.compute(predictions=predictions, references=references, batch_size=2, device="cuda:0")
|
45 |
```
|
46 |
|
@@ -57,7 +55,9 @@ The output of SemScore is a dictionary with the following values:
|
|
57 |
One limitation of SemScore is its dependence on an underlying transformer model to compute semantic textual similarity between model and target outputs. This implementation relies on the strongest sentence transformer model, as reported by the authors of the `sentence-transformers` library, by default. However, better embedding models have become available since the publication of the SemScore paper (e.g. those listed in the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)).
|
58 |
|
59 |
In addition, a more general limitation is that SemScore requires at least one gold-standard target output against which to compare a generated response. This target output should be human created or at least human-vetted.
|
|
|
60 |
## Citation
|
|
|
61 |
@misc{semscore,
|
62 |
title={SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity},
|
63 |
author={Ansar Aynetdinov and Alan Akbik},
|
@@ -67,6 +67,7 @@ In addition, a more general limitation is that SemScore requires at least one go
|
|
67 |
primaryClass={cs.CL},
|
68 |
url={https://arxiv.org/abs/2401.17072},
|
69 |
}
|
|
|
70 |
|
71 |
## Further References
|
72 |
- [SemScore paper](https://arxiv.org/abs/2401.17072)
|
|
|
20 |
|
21 |
```python
|
22 |
import evaluate
|
|
|
23 |
semscore = evaluate.load("semscore", "model_name")
|
24 |
```
|
25 |
|
|
|
39 |
```python
|
40 |
predictions = ['This is an example sentence', 'Each sentence is considered']
|
41 |
references = ['This is an example sentence', 'Each sentence is considered']
|
|
|
42 |
results = semscore.compute(predictions=predictions, references=references, batch_size=2, device="cuda:0")
|
43 |
```
|
44 |
|
|
|
55 |
One limitation of SemScore is its dependence on an underlying transformer model to compute semantic textual similarity between model and target outputs. This implementation relies on the strongest sentence transformer model, as reported by the authors of the `sentence-transformers` library, by default. However, better embedding models have become available since the publication of the SemScore paper (e.g. those listed in the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)).
|
56 |
|
57 |
In addition, a more general limitation is that SemScore requires at least one gold-standard target output against which to compare a generated response. This target output should be human created or at least human-vetted.
|
58 |
+
|
59 |
## Citation
|
60 |
+
```bibtex
|
61 |
@misc{semscore,
|
62 |
title={SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity},
|
63 |
author={Ansar Aynetdinov and Alan Akbik},
|
|
|
67 |
primaryClass={cs.CL},
|
68 |
url={https://arxiv.org/abs/2401.17072},
|
69 |
}
|
70 |
+
```
|
71 |
|
72 |
## Further References
|
73 |
- [SemScore paper](https://arxiv.org/abs/2401.17072)
|
semscore.py
CHANGED
@@ -32,8 +32,7 @@ _CITATION = """\
|
|
32 |
"""
|
33 |
|
34 |
_DESCRIPTION = """\
|
35 |
-
SemScore measures semantic textual similarity between candidate and reference texts.
|
36 |
-
strongly correlate with human judgment on a system-level when evaluating instruction-tuned models.
|
37 |
"""
|
38 |
|
39 |
|
|
|
32 |
"""
|
33 |
|
34 |
_DESCRIPTION = """\
|
35 |
+
SemScore measures semantic textual similarity between candidate and reference texts.
|
|
|
36 |
"""
|
37 |
|
38 |
|