Spaces:

AtlaAI
/

selene-1-mini-tech-report

Running

App Files Files Community

spisupat commited on Jan 29

Commit

3a48bd4

verified ·

1 Parent(s): 40eda1d

Update index.html

Browse files

Files changed (1) hide show

index.html +4 -4

index.html CHANGED Viewed

@@ -112,7 +112,7 @@
     <!-- Logo -->
     <div class="columns is-centered has-text-centered">
       <div class="column is-2">
-        <img src="atla-logo.png" alt="Atla Logo" style="width: 50%">
       </div>
     </div>
@@ -146,7 +146,7 @@
             Human evaluation is time-consuming and expensive, and scales poorly with volume and complexity – hence the need for scalable, automated techniques. As generative models have become more capable, the field has addressed this need by using LLMs themselves to evaluate other LLMs' responses, producing judgments and natural language critiques without humans in the loop – an approach also known as "LLM-as-a-judge" (LLMJ).
           </p>
           <figure class="image">
-            <img src="Fig1.png" alt="Performance comparison">
             <figcaption>
               <b>Figure 1:</b> Atla Selene Mini outperforms current state-of-the-art SLMJs: a) Overall task-average performance, comparing Atla Selene Mini (black) with the best and most widely used SLMJs. b) Breakdown of performance by task type and benchmark.
             </figcaption>
@@ -165,7 +165,7 @@
           </p>
           <figure class="image">
-            <img src="Fig2.png" alt="Data curation strategy">
             <figcaption>
               <b>Figure 2:</b> Data curation strategy: The process of transforming a candidate dataset (left) into the final training mix (right). Yellow boxes indicate filtering steps, purple represents synthetic generation of chosen and rejected pairs for preference optimization.
             </figcaption>
@@ -366,7 +366,7 @@
           </p>
           <figure class="image">
-            <img src="Fig3.png" alt="Real-world evaluation">
             <figcaption>
               <b>Figure 3:</b> Real-world evaluation: a) Performance on domain-specific industry benchmarks b) Performance on RewardBench with different prompt formats c) Performance measured by ELO scores in Judge Arena.
             </figcaption>

     <!-- Logo -->
     <div class="columns is-centered has-text-centered">
       <div class="column is-2">
+        <img src="figs/atla-logo.png" alt="Atla Logo" style="width: 50%">
       </div>
     </div>
             Human evaluation is time-consuming and expensive, and scales poorly with volume and complexity – hence the need for scalable, automated techniques. As generative models have become more capable, the field has addressed this need by using LLMs themselves to evaluate other LLMs' responses, producing judgments and natural language critiques without humans in the loop – an approach also known as "LLM-as-a-judge" (LLMJ).
           </p>
           <figure class="image">
+            <img src="figs/Fig1.png" alt="Performance comparison">
             <figcaption>
               <b>Figure 1:</b> Atla Selene Mini outperforms current state-of-the-art SLMJs: a) Overall task-average performance, comparing Atla Selene Mini (black) with the best and most widely used SLMJs. b) Breakdown of performance by task type and benchmark.
             </figcaption>
           </p>
           <figure class="image">
+            <img src="figs/Fig2.png" alt="Data curation strategy">
             <figcaption>
               <b>Figure 2:</b> Data curation strategy: The process of transforming a candidate dataset (left) into the final training mix (right). Yellow boxes indicate filtering steps, purple represents synthetic generation of chosen and rejected pairs for preference optimization.
             </figcaption>
           </p>
           <figure class="image">
+            <img src="figs/Fig3.png" alt="Real-world evaluation">
             <figcaption>
               <b>Figure 3:</b> Real-world evaluation: a) Performance on domain-specific industry benchmarks b) Performance on RewardBench with different prompt formats c) Performance measured by ELO scores in Judge Arena.
             </figcaption>