Spaces:

bethgelab
/

lm-similarity

Running

App Files Files Community

Joschka Strueber commited on Feb 7

Commit

bd1b20b

1 Parent(s): 0d09d9a

[Ref] back to markdown

Browse files

Files changed (1) hide show

app.py +8 -19

app.py CHANGED Viewed

@@ -78,27 +78,16 @@ with gr.Blocks(title="LLM Similarity Analyzer", css=app_util.custom_css) as demo
     )
     gr.Markdown("## Information")
-    metric_info_html = r"""
-<!-- Include KaTeX CSS for styling -->
-<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-vZTGXXFDvM1R7zDKx2g5N5S4FcoFdTJuFTz1Xj2A2/J1j4fGmS7a6hLQ6ZPfF1sk" crossorigin="anonymous">
-<!-- Include KaTeX and its auto-render extension -->
-<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-6R6ckgSpF6yXUHg9+KJGXN9I+ik5U9dviDuzhSxrtk4AUaGr8/8Qovm6N9fl/hkz" crossorigin="anonymous"></script>
-<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-mll67QQ8ErU7t8/QqU3m0Cq56E7i2xUeFYSv6O9V3CRjNdqPzqxK9z6gS9GQFj8D" crossorigin="anonymous"
-    onload="renderMathInElement(document.body);"></script>
-<div>
-  <p>
-    We propose Chance Adjusted Probabilistic Agreement ($\operatorname{CAPA}$, or $\kappa_p$), a novel metric
-    for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find:
-  </p>
-  <ol>
-    <li>LLM-as-a-judge scores are biased towards more similar models controlling for the model's capability.</li>
-    <li>Gain from training strong models on annotations of weak supervisors (weak-to-strong generalization) is higher when the two models are more different.</li>
-    <li>Concerningly, model errors are getting more correlated as capabilities increase.</li>
-  </ol>
-</div>
 """
-    gr.HTML(value=metric_info_html)
     with gr.Row():
         gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
     gr.Markdown("""

     )
     gr.Markdown("## Information")
+    metric_info_markdown = r"""
+We propose Chance Adjusted Probabilistic Agreement (\(\operatorname{CAPA}\), or \(\kappa_p\)), a novel metric for model similarity which adjusts for chance agreement due to accuracy.
+Using CAPA, we find:
+1. LLM-as-a-judge scores are biased towards more similar models controlling for the model's capability.
+2. Gain from training strong models on annotations of weak supervisors (weak-to-strong generalization) is higher when the two models are more different.
+3. Concerningly, model errors are getting more correlated as capabilities increase.
 """
+    gr.Markdown(metric_info_markdown)
     with gr.Row():
         gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
     gr.Markdown("""