Spaces:

nhop
/

L3Score

Running

nhop commited on Apr 15

Commit

0fd3cac

verified ·

1 Parent(s): 8f7a170

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,19 +1,20 @@
 ---
 title: L3Score
 datasets:
-  - google/spiqa
 tags:
-  - evaluate
-  - metric
-  - semantic-similarity
-  - qa
-  - llm-eval
 description: >
-  L3Score is a metric for evaluating the semantic similarity of free-form answers in question answering tasks.
-  It uses log-probabilities of "Yes"/"No" tokens from a language model acting as a judge.
-  Based on the SPIQA benchmark: https://arxiv.org/pdf/2407.09413
 sdk: gradio
-sdk_version: 3.19.1
 app_file: app.py
 pinned: false
 ---
@@ -161,7 +162,4 @@ score = l3score.compute(
 - 🤗 [Dataset on Hugging Face](https://huggingface.co/datasets/google/spiqa)
 - 🐙 [GitHub Repository](https://github.com/google/spiqa)
-- 📄 [SPIQA Paper (arXiv:2407.09413)](https://arxiv.org/pdf/2407.09413)

 ---
 title: L3Score
 datasets:
+- google/spiqa
 tags:
+- evaluate
+- metric
+- semantic-similarity
+- qa
+- llm-eval
 description: >
+  L3Score is a metric for evaluating the semantic similarity of free-form
+  answers in question answering tasks. It uses log-probabilities of "Yes"/"No"
+  tokens from a language model acting as a judge. Based on the SPIQA benchmark:
+  https://arxiv.org/pdf/2407.09413
 sdk: gradio
+sdk_version: 5.25.1
 app_file: app.py
 pinned: false
 ---
 - 🤗 [Dataset on Hugging Face](https://huggingface.co/datasets/google/spiqa)
 - 🐙 [GitHub Repository](https://github.com/google/spiqa)
+- 📄 [SPIQA Paper (arXiv:2407.09413)](https://arxiv.org/pdf/2407.09413)