aynetdia commited on
Commit
9986ffb
·
1 Parent(s): 6fc5137

adjust readme

Browse files
Files changed (2) hide show
  1. README.md +3 -2
  2. semscore.py +1 -2
README.md CHANGED
@@ -20,7 +20,6 @@ When loading SemScore, you can choose any pre-trained encoder-only model uploade
20
 
21
  ```python
22
  import evaluate
23
-
24
  semscore = evaluate.load("semscore", "model_name")
25
  ```
26
 
@@ -40,7 +39,6 @@ Its optional arguments are:
40
  ```python
41
  predictions = ['This is an example sentence', 'Each sentence is considered']
42
  references = ['This is an example sentence', 'Each sentence is considered']
43
-
44
  results = semscore.compute(predictions=predictions, references=references, batch_size=2, device="cuda:0")
45
  ```
46
 
@@ -57,7 +55,9 @@ The output of SemScore is a dictionary with the following values:
57
  One limitation of SemScore is its dependence on an underlying transformer model to compute semantic textual similarity between model and target outputs. This implementation relies on the strongest sentence transformer model, as reported by the authors of the `sentence-transformers` library, by default. However, better embedding models have become available since the publication of the SemScore paper (e.g. those listed in the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)).
58
 
59
  In addition, a more general limitation is that SemScore requires at least one gold-standard target output against which to compare a generated response. This target output should be human created or at least human-vetted.
 
60
  ## Citation
 
61
  @misc{semscore,
62
  title={SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity},
63
  author={Ansar Aynetdinov and Alan Akbik},
@@ -67,6 +67,7 @@ In addition, a more general limitation is that SemScore requires at least one go
67
  primaryClass={cs.CL},
68
  url={https://arxiv.org/abs/2401.17072},
69
  }
 
70
 
71
  ## Further References
72
  - [SemScore paper](https://arxiv.org/abs/2401.17072)
 
20
 
21
  ```python
22
  import evaluate
 
23
  semscore = evaluate.load("semscore", "model_name")
24
  ```
25
 
 
39
  ```python
40
  predictions = ['This is an example sentence', 'Each sentence is considered']
41
  references = ['This is an example sentence', 'Each sentence is considered']
 
42
  results = semscore.compute(predictions=predictions, references=references, batch_size=2, device="cuda:0")
43
  ```
44
 
 
55
  One limitation of SemScore is its dependence on an underlying transformer model to compute semantic textual similarity between model and target outputs. This implementation relies on the strongest sentence transformer model, as reported by the authors of the `sentence-transformers` library, by default. However, better embedding models have become available since the publication of the SemScore paper (e.g. those listed in the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)).
56
 
57
  In addition, a more general limitation is that SemScore requires at least one gold-standard target output against which to compare a generated response. This target output should be human created or at least human-vetted.
58
+
59
  ## Citation
60
+ ```bibtex
61
  @misc{semscore,
62
  title={SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity},
63
  author={Ansar Aynetdinov and Alan Akbik},
 
67
  primaryClass={cs.CL},
68
  url={https://arxiv.org/abs/2401.17072},
69
  }
70
+ ```
71
 
72
  ## Further References
73
  - [SemScore paper](https://arxiv.org/abs/2401.17072)
semscore.py CHANGED
@@ -32,8 +32,7 @@ _CITATION = """\
32
  """
33
 
34
  _DESCRIPTION = """\
35
- SemScore measures semantic textual similarity between candidate and reference texts. It has been shown to
36
- strongly correlate with human judgment on a system-level when evaluating instruction-tuned models.
37
  """
38
 
39
 
 
32
  """
33
 
34
  _DESCRIPTION = """\
35
+ SemScore measures semantic textual similarity between candidate and reference texts.
 
36
  """
37
 
38