Spaces:
Running
Running
Joschka Strueber
commited on
Commit
·
69fd3ae
1
Parent(s):
274c92e
[Fix] mathjax in metric explanation
Browse files
app.py
CHANGED
@@ -78,7 +78,7 @@ with gr.Blocks(title="LLM Similarity Analyzer", css=app_util.custom_css) as demo
|
|
78 |
)
|
79 |
|
80 |
gr.Markdown("## Information")
|
81 |
-
gr.Markdown("""We propose Chance Adjusted Probabilistic Agreement (\(\operatorname{CAPA}\), or \(\kappa_p\)), a novel metric \
|
82 |
for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
|
83 |
biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
|
84 |
of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
|
|
|
78 |
)
|
79 |
|
80 |
gr.Markdown("## Information")
|
81 |
+
gr.Markdown(r"""We propose Chance Adjusted Probabilistic Agreement (\(\operatorname{CAPA}\), or \(\kappa_p\)), a novel metric \
|
82 |
for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
|
83 |
biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
|
84 |
of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
|