File size: 10,504 Bytes
4a00e43 cb42fb5 4a00e43 28e5488 4a00e43 7c537d9 4a00e43 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 7c537d9 4a00e43 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 7c537d9 4a00e43 8e3115f 38cb3c2 8e3115f 38cb3c2 8e3115f 38cb3c2 7c537d9 cb42fb5 4a00e43 7c537d9 cb42fb5 4a00e43 0e12c29 4a00e43 0e12c29 1c64301 0e12c29 4a00e43 38cb3c2 4a00e43 0e12c29 4a00e43 cb42fb5 7c537d9 4a00e43 cb42fb5 84b38c9 4a00e43 bf2e6d8 a7a2b39 bf2e6d8 0e12c29 40eda1d 4a00e43 7bf2f17 4a00e43 7c537d9 bf2e6d8 d71296c bf2e6d8 37cf514 bf2e6d8 d71296c bf2e6d8 38cb3c2 0e12c29 4a00e43 37cf514 0e12c29 aa490c1 2be9dc5 aa490c1 0e12c29 3a48bd4 4a00e43 0e12c29 3a48bd4 4a00e43 0e12c29 3a48bd4 4a00e43 fbe5b7c a533d6f fbe5b7c 4a00e43 7c537d9 4a00e43 7c537d9 4a00e43 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="Atla Selene Mini: A General Purpose Evaluation Model">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Atla Selene Mini: A General Purpose Evaluation Model</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/favicon.svg">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Atla Selene Mini:<br>A General Purpose Evaluation Model</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://huggingface.co/inwaves" target="_blank">Andrei Alexandru</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/NinaCalvi" target="_blank">Antonia Calvi</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/HennersBro98" target="_blank">Henry Broomfield</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/jacksongolden" target="_blank">Jackson Golden</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/kaikaidai" target="_blank">Kyle Dai</a><sup>1</sup>,</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://huggingface.co/mathias-atla" target="_blank">Mathias Leys</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/MauriceBurg" target="_blank">Maurice Burger</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/mbartolo" target="_blank">Max Bartolo</a><sup>2,3</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/RomanEngeler1805" target="_blank">Roman Engeler</a><sup>1</sup>,</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://huggingface.co/spisupat" target="_blank">Sashank Pisupati</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/tobydrane" target="_blank">Toby Drane</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://huggingface.co/youngsunpark" target="_blank">Young Sun Park</a><sup>1</sup></span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>atla,</span>
<span class="author-block"><sup>2</sup>University College London,</span>
<span class="author-block"><sup>3</sup>Cohere</span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link -->
<span class="link-block">
<a href="arxiv_submitted.pdf" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<!-- HuggingFace Link -->
<span class="link-block">
<a href="https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
🤗
</span>
<span>HuggingFace</span>
</a>
</span>
<!-- Github Link -->
<span class="link-block">
<a href="https://github.com/atla-ai/selene-mini" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span>
<!-- Ollama Link -->
<span class="link-block">
<a href="https://ollama.com/atla/selene-mini" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-code"></i>
</span>
<span>Ollama</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section" style="padding-top: 0;">
<div class="container is-max-desktop">
<!-- Logo -->
<div class="columns is-centered has-text-centered">
<div class="column is-2">
<div style="max-width: 200px; margin: 0 auto;">
<img src="figs/atla-logo.png" alt="Atla Logo" style="width: 100%; height: auto;">
</div>
</div>
</div>
<!-- Abstract -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
We introduce Atla Selene Mini, a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini is a general-purpose evaluator that outperforms the best SLMJs and GPT-4o-mini on overall performance across 11 out-of-distribution benchmarks, spanning absolute scoring, classification, and pairwise preference tasks. It is the highest-scoring 8B generative model on RewardBench, surpassing strong baselines like GPT-4o and specialized judges.
</p>
<p>
To achieve this, we develop a principled data curation strategy that augments public datasets with synthetically generated critiques and ensures high quality through filtering and dataset ablations. We train our model on a combined direct preference optimization (DPO) and supervised fine-tuning (SFT) loss, and produce a highly promptable evaluator that excels in real-world scenarios.
</p>
<p>
Selene Mini shows dramatically improved zero-shot agreement with human expert evaluations on financial and medical industry datasets. It is also robust to variations in prompt format. Preliminary results indicate that Selene Mini is the top-ranking evaluator in a live, community-driven <a href="https://huggingface.co/blog/arena-atla" target="_blank">Judge Arena</a>. We release the model weights on <a href="https://hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B" target="_blank">HuggingFace</a> and <a href="https://ollama.com/atla/selene-mini" target="_blank">Ollama</a> to encourage widespread community adoption.
</p>
</div>
</div>
</div>
<!-- Demo GIF -->
<!-- <div class="columns is-centered">
<div class="column is-four-fifths">
<div class="content has-text-centered">
<img src="/api/placeholder/800/400" alt="<placeholder for GIF>"/>
<p class="subtitle">
Demo of Atla Selene Mini in action
</p>
</div>
</div>
</div> -->
<!-- Key Results -->
<div class="columns is-centered">
<div class="column is-four-fifths">
<h2 class="title is-3 has-text-centered">Key Results</h2>
<div class="content has-text-justified">
<div class="columns is-centered has-text-centered">
Read the full technical report <a href="arxiv_submitted.pdf" target="_blank">here</a>
</div>
<figure class="image">
<img src="figs/Fig1.png" alt="Performance comparison">
<figcaption>
<b>Figure 1:</b> Atla Selene Mini outperforms current state-of-the-art SLMJs: a) Overall task-average performance, comparing Atla Selene Mini (black) with the best and most widely used SLMJs. b) Breakdown of performance by task type and benchmark.
</figcaption>
</figure>
<figure class="image">
<img src="figs/Fig2.png" alt="Data curation strategy">
<figcaption>
<b>Figure 2:</b> Data curation strategy: The process of transforming a candidate dataset (left) into the final training mix (right). Yellow boxes indicate filtering steps, purple represents synthetic generation of chosen and rejected pairs for preference optimization.
</figcaption>
</figure>
<figure class="image">
<img src="figs/Fig3.png" alt="Real-world evaluation">
<figcaption>
<b>Figure 3:</b> Real-world evaluation: a) Performance on domain-specific industry benchmarks b) Performance on RewardBench with different prompt formats c) Performance measured by ELO scores in Judge Arena.
</figcaption>
</figure>
<div class="columns is-centered has-text-centered">
Our larger model from the Selene family will be released soon. Sign up to our <a href="https://www.atla-ai.com/sign-up-waitlist" target="_blank">waitlist</a> to get first access.
</div>
</div>
</div>
</div>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<p>
© 2025 Atla AI
</p>
</div>
</div>
</footer>
</body>
</html> |