remove size from name column
Browse files
README.md
CHANGED
@@ -56,7 +56,7 @@ Local Costs based on lambdalabs instances at January 2025 prices
|
|
56 |
|
57 |
Rank | Model | VRAM (GB) | GSM8K (%) | IFEval (%) | MUSR-Murder (%) | MUSR-Object (%) | MUSR-Team (%) | Avg Score | Relative to Root Judge (%) |
|
58 |
| ---|--------------|------------|--------|---------|--------------|--------------|------------|------------|--------------------|
|
59 |
-
**1** | **Root Judge
|
60 |
2 | Llama-3.3-70B | 140 | 94.4 ± 0.6 | 93.4 | 54.0 ± 3.2 | 23.4 ± 2.7 | 56.0 ± 3.2 | 64.3 | 99.5 |
|
61 |
3 | Patronus-70B | 140 | 91.7 ± 0.8 | 83.7 | 54.4 ± 3.2 | 24.6 ± 2.7 | 48.8 ± 3.2 | 60.6 | 93.9 |
|
62 |
4 | Nemotron-70B | 70 | 80.1 ± 1.1 | 85.0 | 53.6 ± 3.2 | 23.8 ± 2.7 | 55.6 ± 3.1 | 59.6 | 92.4 |
|
|
|
56 |
|
57 |
Rank | Model | VRAM (GB) | GSM8K (%) | IFEval (%) | MUSR-Murder (%) | MUSR-Object (%) | MUSR-Team (%) | Avg Score | Relative to Root Judge (%) |
|
58 |
| ---|--------------|------------|--------|---------|--------------|--------------|------------|------------|--------------------|
|
59 |
+
**1** | **Root Judge** | 70 | **94.6 ± 0.6** | **93.9** | 52.8 ± 3.2 | 24.6 ± 2.7 | **56.8 ± 3.1** | **64.5** | 100 |
|
60 |
2 | Llama-3.3-70B | 140 | 94.4 ± 0.6 | 93.4 | 54.0 ± 3.2 | 23.4 ± 2.7 | 56.0 ± 3.2 | 64.3 | 99.5 |
|
61 |
3 | Patronus-70B | 140 | 91.7 ± 0.8 | 83.7 | 54.4 ± 3.2 | 24.6 ± 2.7 | 48.8 ± 3.2 | 60.6 | 93.9 |
|
62 |
4 | Nemotron-70B | 70 | 80.1 ± 1.1 | 85.0 | 53.6 ± 3.2 | 23.8 ± 2.7 | 55.6 ± 3.1 | 59.6 | 92.4 |
|