spisupat commited on
Commit
3a48bd4
·
verified ·
1 Parent(s): 40eda1d

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +4 -4
index.html CHANGED
@@ -112,7 +112,7 @@
112
  <!-- Logo -->
113
  <div class="columns is-centered has-text-centered">
114
  <div class="column is-2">
115
- <img src="atla-logo.png" alt="Atla Logo" style="width: 50%">
116
  </div>
117
  </div>
118
 
@@ -146,7 +146,7 @@
146
  Human evaluation is time-consuming and expensive, and scales poorly with volume and complexity – hence the need for scalable, automated techniques. As generative models have become more capable, the field has addressed this need by using LLMs themselves to evaluate other LLMs' responses, producing judgments and natural language critiques without humans in the loop – an approach also known as "LLM-as-a-judge" (LLMJ).
147
  </p>
148
  <figure class="image">
149
- <img src="Fig1.png" alt="Performance comparison">
150
  <figcaption>
151
  <b>Figure 1:</b> Atla Selene Mini outperforms current state-of-the-art SLMJs: a) Overall task-average performance, comparing Atla Selene Mini (black) with the best and most widely used SLMJs. b) Breakdown of performance by task type and benchmark.
152
  </figcaption>
@@ -165,7 +165,7 @@
165
  </p>
166
 
167
  <figure class="image">
168
- <img src="Fig2.png" alt="Data curation strategy">
169
  <figcaption>
170
  <b>Figure 2:</b> Data curation strategy: The process of transforming a candidate dataset (left) into the final training mix (right). Yellow boxes indicate filtering steps, purple represents synthetic generation of chosen and rejected pairs for preference optimization.
171
  </figcaption>
@@ -366,7 +366,7 @@
366
  </p>
367
 
368
  <figure class="image">
369
- <img src="Fig3.png" alt="Real-world evaluation">
370
  <figcaption>
371
  <b>Figure 3:</b> Real-world evaluation: a) Performance on domain-specific industry benchmarks b) Performance on RewardBench with different prompt formats c) Performance measured by ELO scores in Judge Arena.
372
  </figcaption>
 
112
  <!-- Logo -->
113
  <div class="columns is-centered has-text-centered">
114
  <div class="column is-2">
115
+ <img src="figs/atla-logo.png" alt="Atla Logo" style="width: 50%">
116
  </div>
117
  </div>
118
 
 
146
  Human evaluation is time-consuming and expensive, and scales poorly with volume and complexity – hence the need for scalable, automated techniques. As generative models have become more capable, the field has addressed this need by using LLMs themselves to evaluate other LLMs' responses, producing judgments and natural language critiques without humans in the loop – an approach also known as "LLM-as-a-judge" (LLMJ).
147
  </p>
148
  <figure class="image">
149
+ <img src="figs/Fig1.png" alt="Performance comparison">
150
  <figcaption>
151
  <b>Figure 1:</b> Atla Selene Mini outperforms current state-of-the-art SLMJs: a) Overall task-average performance, comparing Atla Selene Mini (black) with the best and most widely used SLMJs. b) Breakdown of performance by task type and benchmark.
152
  </figcaption>
 
165
  </p>
166
 
167
  <figure class="image">
168
+ <img src="figs/Fig2.png" alt="Data curation strategy">
169
  <figcaption>
170
  <b>Figure 2:</b> Data curation strategy: The process of transforming a candidate dataset (left) into the final training mix (right). Yellow boxes indicate filtering steps, purple represents synthetic generation of chosen and rejected pairs for preference optimization.
171
  </figcaption>
 
366
  </p>
367
 
368
  <figure class="image">
369
+ <img src="figs/Fig3.png" alt="Real-world evaluation">
370
  <figcaption>
371
  <b>Figure 3:</b> Real-world evaluation: a) Performance on domain-specific industry benchmarks b) Performance on RewardBench with different prompt formats c) Performance measured by ELO scores in Judge Arena.
372
  </figcaption>