Replace cost estimates with actual costs based on tokens usage
Browse files
README.md
CHANGED
@@ -37,13 +37,16 @@ while providing detailed, structured justifications on long inputs of up to 32k
|
|
37 |
|
38 |
Rank | Model | Test Samples | Pass@1 Rate (%) | Cost ($)
|
39 |
| --- | --- | --- | --- | --- |
|
40 |
-
**1** | **Root Judge**
|
41 |
-
2 | GPT-4o | 14900 | 86.1 |
|
42 |
-
3 | o1-preview | 14899 | 85.3 | 1062
|
43 |
-
4 | Claude Sonnet-3.5 | 14797 | 85.2 |
|
44 |
-
5 | Llama3.1-70b-Instruct| 13969 | 84.7 |
|
45 |
6 | o1-mini | 14655 | 83.7 | 156
|
46 |
-
7 | Llama3.1-405b-Instruct | 14881 | 83.6 |
|
|
|
|
|
|
|
47 |
|
48 |
[🔎 Detailed Performance Breakdown - Hallucination Detection](https://docs.google.com/spreadsheets/d/1NM9VgGG9_-1kQbepeoueUTkvT1bDeRndTD4RM5iV7l4/edit?usp=sharing)
|
49 |
|
|
|
37 |
|
38 |
Rank | Model | Test Samples | Pass@1 Rate (%) | Cost ($)
|
39 |
| --- | --- | --- | --- | --- |
|
40 |
+
**1** | **Root Judge** | 14900 | **86.3** | **3.98**
|
41 |
+
2 | GPT-4o | 14900 | 86.1 | 33.12
|
42 |
+
3 | o1-preview | 14899 | 85.3 | 1062*
|
43 |
+
4 | Claude Sonnet-3.5 | 14797 | 85.2 | 42.94
|
44 |
+
5 | Llama3.1-70b-Instruct| 13969 | 84.7 | 27.43
|
45 |
6 | o1-mini | 14655 | 83.7 | 156
|
46 |
+
7 | Llama3.1-405b-Instruct | 14881 | 83.6 | 269.82
|
47 |
+
|
48 |
+
`*`=benchmarked as o1-preview; at current o1 prices, without reasoning tokens, the cost would start at $198.74 instead
|
49 |
+
Local Costs based on lambdalabs instances at January 2025 prices
|
50 |
|
51 |
[🔎 Detailed Performance Breakdown - Hallucination Detection](https://docs.google.com/spreadsheets/d/1NM9VgGG9_-1kQbepeoueUTkvT1bDeRndTD4RM5iV7l4/edit?usp=sharing)
|
52 |
|