Spaces:

aynetdia
/

semscore

Running

App Files Files Community

aynetdia commited on Jun 2

Commit

9986ffb

1 Parent(s): 6fc5137

adjust readme

Browse files

Files changed (2) hide show

README.md +3 -2
semscore.py +1 -2

README.md CHANGED Viewed

@@ -20,7 +20,6 @@ When loading SemScore, you can choose any pre-trained encoder-only model uploade
 ```python
 import evaluate
 semscore = evaluate.load("semscore", "model_name")
 ```
@@ -40,7 +39,6 @@ Its optional arguments are:
 ```python
 predictions = ['This is an example sentence', 'Each sentence is considered']
 references = ['This is an example sentence', 'Each sentence is considered']
 results = semscore.compute(predictions=predictions, references=references, batch_size=2, device="cuda:0")
 ```
@@ -57,7 +55,9 @@ The output of SemScore is a dictionary with the following values:
 One limitation of SemScore is its dependence on an underlying transformer model to compute semantic textual similarity between model and target outputs. This implementation relies on the strongest sentence transformer model, as reported by the authors of the `sentence-transformers` library, by default. However, better embedding models have become available since the publication of the SemScore paper (e.g. those listed in the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)).
 In addition, a more general limitation is that SemScore requires at least one gold-standard target output against which to compare a generated response. This target output should be human created or at least human-vetted.
 ## Citation
 @misc{semscore,
     title={SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity},
     author={Ansar Aynetdinov and Alan Akbik},
@@ -67,6 +67,7 @@ In addition, a more general limitation is that SemScore requires at least one go
     primaryClass={cs.CL},
     url={https://arxiv.org/abs/2401.17072},
 }
 ## Further References
 - [SemScore paper](https://arxiv.org/abs/2401.17072)

 ```python
 import evaluate
 semscore = evaluate.load("semscore", "model_name")
 ```
 ```python
 predictions = ['This is an example sentence', 'Each sentence is considered']
 references = ['This is an example sentence', 'Each sentence is considered']
 results = semscore.compute(predictions=predictions, references=references, batch_size=2, device="cuda:0")
 ```
 One limitation of SemScore is its dependence on an underlying transformer model to compute semantic textual similarity between model and target outputs. This implementation relies on the strongest sentence transformer model, as reported by the authors of the `sentence-transformers` library, by default. However, better embedding models have become available since the publication of the SemScore paper (e.g. those listed in the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)).
 In addition, a more general limitation is that SemScore requires at least one gold-standard target output against which to compare a generated response. This target output should be human created or at least human-vetted.
 ## Citation
+```bibtex
 @misc{semscore,
     title={SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity},
     author={Ansar Aynetdinov and Alan Akbik},
     primaryClass={cs.CL},
     url={https://arxiv.org/abs/2401.17072},
 }
+```
 ## Further References
 - [SemScore paper](https://arxiv.org/abs/2401.17072)

semscore.py CHANGED Viewed

@@ -32,8 +32,7 @@ _CITATION = """\
 """
 _DESCRIPTION = """\
-SemScore measures semantic textual similarity between candidate and reference texts. It has been shown to
-strongly correlate with human judgment on a system-level when evaluating instruction-tuned models.
 """

 """
 _DESCRIPTION = """\
+SemScore measures semantic textual similarity between candidate and reference texts.
 """