calcuis commited on
Commit
0f18ddc
·
verified ·
1 Parent(s): 6f83445

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -57,4 +57,4 @@ at the high-level overview of the model quality on representative benchmarks. Fo
57
  | Factual Knowledge | SimpleQA | 3.0 | 7.6 | 5.4 | 9.9 | 20.9 | 10.2 | **39.4** |
58
  | Reasoning | DROP | 75.5 | 68.3 | 85.5 | 79.3 | **90.2** | 76.7 | 80.9 |
59
 
60
- \* these scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following. We use the simple-evals framework because it is reproducible, but Meta reports 77 for MATH and 88 for HumanEval on Llama-3.3-70B.
 
57
  | Factual Knowledge | SimpleQA | 3.0 | 7.6 | 5.4 | 9.9 | 20.9 | 10.2 | **39.4** |
58
  | Reasoning | DROP | 75.5 | 68.3 | 85.5 | 79.3 | **90.2** | 76.7 | 80.9 |
59
 
60
+ \* these scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following.