nhop commited on
Commit
0fd3cac
Β·
verified Β·
1 Parent(s): 8f7a170

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -14
README.md CHANGED
@@ -1,19 +1,20 @@
1
  ---
2
  title: L3Score
3
  datasets:
4
- - google/spiqa
5
  tags:
6
- - evaluate
7
- - metric
8
- - semantic-similarity
9
- - qa
10
- - llm-eval
11
  description: >
12
- L3Score is a metric for evaluating the semantic similarity of free-form answers in question answering tasks.
13
- It uses log-probabilities of "Yes"/"No" tokens from a language model acting as a judge.
14
- Based on the SPIQA benchmark: https://arxiv.org/pdf/2407.09413
 
15
  sdk: gradio
16
- sdk_version: 3.19.1
17
  app_file: app.py
18
  pinned: false
19
  ---
@@ -161,7 +162,4 @@ score = l3score.compute(
161
 
162
  - πŸ€— [Dataset on Hugging Face](https://huggingface.co/datasets/google/spiqa)
163
  - πŸ™ [GitHub Repository](https://github.com/google/spiqa)
164
- - πŸ“„ [SPIQA Paper (arXiv:2407.09413)](https://arxiv.org/pdf/2407.09413)
165
-
166
-
167
-
 
1
  ---
2
  title: L3Score
3
  datasets:
4
+ - google/spiqa
5
  tags:
6
+ - evaluate
7
+ - metric
8
+ - semantic-similarity
9
+ - qa
10
+ - llm-eval
11
  description: >
12
+ L3Score is a metric for evaluating the semantic similarity of free-form
13
+ answers in question answering tasks. It uses log-probabilities of "Yes"/"No"
14
+ tokens from a language model acting as a judge. Based on the SPIQA benchmark:
15
+ https://arxiv.org/pdf/2407.09413
16
  sdk: gradio
17
+ sdk_version: 5.25.1
18
  app_file: app.py
19
  pinned: false
20
  ---
 
162
 
163
  - πŸ€— [Dataset on Hugging Face](https://huggingface.co/datasets/google/spiqa)
164
  - πŸ™ [GitHub Repository](https://github.com/google/spiqa)
165
+ - πŸ“„ [SPIQA Paper (arXiv:2407.09413)](https://arxiv.org/pdf/2407.09413)