root-signals
/

RootSignals-Judge-Llama-70B

@@ -37,13 +37,16 @@ while providing detailed, structured justifications on long inputs of up to 32k
 Rank | Model | Test Samples | Pass@1 Rate (%) | Cost ($)
 | --- | --- | --- | --- | --- |
-**1** | **Root Judge** (FP8) | 14900 | **86.3** | **34**
-2 | GPT-4o | 14900 | 86.1 | -
-3 | o1-preview | 14899 | 85.3 | 1062
-4 | Claude Sonnet-3.5 | 14797 |  85.2 | -
-5 | Llama3.1-70b-Instruct| 13969 | 84.7  | 34
 6 | o1-mini | 14655 | 83.7 | 156
-7 | Llama3.1-405b-Instruct | 14881 | 83.6  | -
 [🔎 Detailed Performance Breakdown - Hallucination Detection](https://docs.google.com/spreadsheets/d/1NM9VgGG9_-1kQbepeoueUTkvT1bDeRndTD4RM5iV7l4/edit?usp=sharing)

 Rank | Model | Test Samples | Pass@1 Rate (%) | Cost ($)
 | --- | --- | --- | --- | --- |
+**1** | **Root Judge** | 14900 | **86.3** | **3.98**
+2 | GPT-4o | 14900 | 86.1 | 33.12
+3 | o1-preview | 14899 | 85.3 | 1062*
+4 | Claude Sonnet-3.5 | 14797 |  85.2 | 42.94
+5 | Llama3.1-70b-Instruct| 13969 | 84.7  | 27.43
 6 | o1-mini | 14655 | 83.7 | 156
+7 | Llama3.1-405b-Instruct | 14881 | 83.6  | 269.82
+`*`=benchmarked as o1-preview; at current o1 prices, without reasoning tokens, the cost would start at $198.74 instead
+Local Costs based on lambdalabs instances at January 2025 prices
 [🔎 Detailed Performance Breakdown - Hallucination Detection](https://docs.google.com/spreadsheets/d/1NM9VgGG9_-1kQbepeoueUTkvT1bDeRndTD4RM5iV7l4/edit?usp=sharing)