TensorTemplar commited on
Commit
50af80f
·
verified ·
1 Parent(s): 632624a

Replace cost estimates with actual costs based on tokens usage

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -37,13 +37,16 @@ while providing detailed, structured justifications on long inputs of up to 32k
37
 
38
  Rank | Model | Test Samples | Pass@1 Rate (%) | Cost ($)
39
  | --- | --- | --- | --- | --- |
40
- **1** | **Root Judge** (FP8) | 14900 | **86.3** | **34**
41
- 2 | GPT-4o | 14900 | 86.1 | -
42
- 3 | o1-preview | 14899 | 85.3 | 1062
43
- 4 | Claude Sonnet-3.5 | 14797 | 85.2 | -
44
- 5 | Llama3.1-70b-Instruct| 13969 | 84.7 | 34
45
  6 | o1-mini | 14655 | 83.7 | 156
46
- 7 | Llama3.1-405b-Instruct | 14881 | 83.6 | -
 
 
 
47
 
48
  [🔎 Detailed Performance Breakdown - Hallucination Detection](https://docs.google.com/spreadsheets/d/1NM9VgGG9_-1kQbepeoueUTkvT1bDeRndTD4RM5iV7l4/edit?usp=sharing)
49
 
 
37
 
38
  Rank | Model | Test Samples | Pass@1 Rate (%) | Cost ($)
39
  | --- | --- | --- | --- | --- |
40
+ **1** | **Root Judge** | 14900 | **86.3** | **3.98**
41
+ 2 | GPT-4o | 14900 | 86.1 | 33.12
42
+ 3 | o1-preview | 14899 | 85.3 | 1062*
43
+ 4 | Claude Sonnet-3.5 | 14797 | 85.2 | 42.94
44
+ 5 | Llama3.1-70b-Instruct| 13969 | 84.7 | 27.43
45
  6 | o1-mini | 14655 | 83.7 | 156
46
+ 7 | Llama3.1-405b-Instruct | 14881 | 83.6 | 269.82
47
+
48
+ `*`=benchmarked as o1-preview; at current o1 prices, without reasoning tokens, the cost would start at $198.74 instead
49
+ Local Costs based on lambdalabs instances at January 2025 prices
50
 
51
  [🔎 Detailed Performance Breakdown - Hallucination Detection](https://docs.google.com/spreadsheets/d/1NM9VgGG9_-1kQbepeoueUTkvT1bDeRndTD4RM5iV7l4/edit?usp=sharing)
52