Niklas Hoepner
commited on
Commit
·
0ca5bff
1
Parent(s):
9b652c8
Put title above input box
Browse files
app.py
CHANGED
@@ -18,6 +18,9 @@ def compute_l3score(api_key, provider, model, questions, predictions, references
|
|
18 |
return {"error": str(e)}
|
19 |
|
20 |
with gr.Blocks() as demo:
|
|
|
|
|
|
|
21 |
|
22 |
|
23 |
with gr.Row():
|
@@ -40,7 +43,6 @@ with gr.Blocks() as demo:
|
|
40 |
)
|
41 |
|
42 |
gr.Markdown(r"""
|
43 |
-
<h1 align="center"> Metric: L3Score </h1>
|
44 |
|
45 |
## 📌 Description
|
46 |
**L3Score** evaluates how semantically close a model-generated answer is to a reference answer for a given question. It prompts a **language model as a judge** using:
|
@@ -62,7 +64,7 @@ with gr.Blocks() as demo:
|
|
62 |
|
63 |
## 🧮 Scoring Logic
|
64 |
|
65 |
-
Let $l_{\text{yes}}$ and $l_{\text{no}}$ be the log-probabilities of "Yes" and "No", respectively.
|
66 |
|
67 |
- If neither token is in the top-5:
|
68 |
|
|
|
18 |
return {"error": str(e)}
|
19 |
|
20 |
with gr.Blocks() as demo:
|
21 |
+
gr.Markdown(r"""
|
22 |
+
<h1 align="center"> Metric: L3Score </h1>
|
23 |
+
""")
|
24 |
|
25 |
|
26 |
with gr.Row():
|
|
|
43 |
)
|
44 |
|
45 |
gr.Markdown(r"""
|
|
|
46 |
|
47 |
## 📌 Description
|
48 |
**L3Score** evaluates how semantically close a model-generated answer is to a reference answer for a given question. It prompts a **language model as a judge** using:
|
|
|
64 |
|
65 |
## 🧮 Scoring Logic
|
66 |
|
67 |
+
Let $ l_{\text{yes}} $ and $ l_{\text{no}} $ be the log-probabilities of "Yes" and "No", respectively.
|
68 |
|
69 |
- If neither token is in the top-5:
|
70 |
|