open-r1-eval-leaderboard

Running

lewtun HF Staff commited on Sep 6, 2024

Commit

11b25cb

verified ·

1 Parent(s): ea9d844

Upload eval_results/mattshumer/Reflection-Llama-3.1-70B/main/mixeval/results_2024-09-06T22-25-59.json with huggingface_hub

Files changed (1) hide show

eval_results/mattshumer/Reflection-Llama-3.1-70B/main/mixeval/results_2024-09-06T22-25-59.json ADDED Viewed

+{
+    "overall score (final score)": 0.8075,
+    "DROP": 0.936,
+    "BBH": 0.926,
+    "MATH": 0.865,
+    "GSM8k": 0.959,
+    "TriviaQA": 0.888,
+    "AGIEval": 0.634,
+    "MMLU": 0.718,
+    "MBPP": 0.0,
+    "BoolQ": 0.869,
+    "HellaSwag": 0.549,
+    "GPQA": 0.0,
+    "PIQA": 0.844,
+    "ARC": 0.913,
+    "OpenBookQA": 0.75,
+    "SIQA": 0.765,
+    "CommonsenseQA": 0.623,
+    "WinoGrande": 1.0
+}