Niklas Hoepner
commited on
Commit
ยท
f0d7015
1
Parent(s):
afabd8a
Tried fixing display of equation
Browse files
app.py
CHANGED
@@ -64,7 +64,7 @@ with gr.Blocks() as demo:
|
|
64 |
|
65 |
## ๐งฎ Scoring Logic
|
66 |
|
67 |
-
Let
|
68 |
|
69 |
- If neither token is in the top-5:
|
70 |
|
@@ -83,8 +83,8 @@ with gr.Blocks() as demo:
|
|
83 |
- the least likely top-5 token
|
84 |
|
85 |
The score ranges from 0 to 1, where 1 indicates the highest confidence by the LLM that the predicted and reference answers are semantically equivalent.
|
86 |
-
|
87 |
See [SPIQA paper](https://arxiv.org/pdf/2407.09413) for details.
|
|
|
88 |
---
|
89 |
|
90 |
## ๐ How to Use
|
@@ -134,7 +134,7 @@ with gr.Blocks() as demo:
|
|
134 |
|
135 |
---
|
136 |
|
137 |
-
## ๐
|
138 |
|
139 |
```python
|
140 |
l3score = evaluate.load("nhop/L3Score")
|
|
|
64 |
|
65 |
## ๐งฎ Scoring Logic
|
66 |
|
67 |
+
Let $$l_{\text{yes}} $$ and $$ l_{\text{no}} $$ be the log-probabilities of "Yes" and "No", respectively.
|
68 |
|
69 |
- If neither token is in the top-5:
|
70 |
|
|
|
83 |
- the least likely top-5 token
|
84 |
|
85 |
The score ranges from 0 to 1, where 1 indicates the highest confidence by the LLM that the predicted and reference answers are semantically equivalent.
|
|
|
86 |
See [SPIQA paper](https://arxiv.org/pdf/2407.09413) for details.
|
87 |
+
|
88 |
---
|
89 |
|
90 |
## ๐ How to Use
|
|
|
134 |
|
135 |
---
|
136 |
|
137 |
+
## ๐ Examples
|
138 |
|
139 |
```python
|
140 |
l3score = evaluate.load("nhop/L3Score")
|