open-r1-eval-leaderboard / eval_results

Commit History

Upload eval_results/google/gemma-2b-it/main/gsm8k/results_2024-03-18T20-39-56.154693.json with huggingface_hub
40e84db
verified

lewtun HF staff commited on

Upload eval_results/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/main/bbh/results_2024-03-18T20-39-32.747348.json with huggingface_hub
d30acf6
verified

lewtun HF staff commited on

Upload eval_results/google/gemma-7b-it/main/bbh/results_2024-03-18T20-38-30.588009.json with huggingface_hub
4c6f00d
verified

lewtun HF staff commited on

Upload eval_results/google/gemma-2b-it/main/bbh/results_2024-03-18T20-36-40.384973.json with huggingface_hub
589bc06
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta/main/bbh/results_2024-03-18T20-36-18.052319.json with huggingface_hub
6cb7ce4
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-ift/v0.2/bbh/results_2024-03-18T20-33-46.216888.json with huggingface_hub
ba09d8b
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-14B-Chat/main/bbh/results_2024-03-18T20-23-46.729702.json with huggingface_hub
737bf1f
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-4B-Chat/main/bbh/results_2024-03-18T20-20-24.322898.json with huggingface_hub
1b733bf
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-1.8B-Chat/main/bbh/results_2024-03-18T20-11-24.511185.json with huggingface_hub
754f9ed
verified

lewtun HF staff commited on

Upload eval_results/teknium/OpenHermes-2.5-Mistral-7B/main/bbh/results_2024-03-18T19-49-31.908303.json with huggingface_hub
40f3905
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/bbh/results_2024-03-18T19-43-00.075213.json with huggingface_hub
7add925
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/ifeval/results_2024-03-18T19-36-03.767073.json with huggingface_hub
73b2107
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/mmlu/results_2024-03-18T17-01-47.545823.json with huggingface_hub
b9d64e1
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/gsm8k/results_2024-03-18T17-01-30.856762.json with huggingface_hub
29394c9
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/hellaswag/results_2024-03-18T16-55-05.213121.json with huggingface_hub
8e02df0
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/arc/results_2024-03-18T16-50-06.159327.json with huggingface_hub
1a1c5de
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/truthfulqa/results_2024-03-18T16-49-56.225132.json with huggingface_hub
3c8fa5b
verified

lewtun HF staff commited on

Upload eval_results/Qwen/Qwen1.5-0.5B-Chat/main/winogrande/results_2024-03-18T16-49-32.523482.json with huggingface_hub
64a7a0d
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.13/gsm8k/results_2024-03-18T16-42-34.222418.json with huggingface_hub
221b580
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.13/ifeval/results_2024-03-18T16-40-04.435577.json with huggingface_hub
e9088c8
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.9/gsm8k/results_2024-03-18T15-21-24.552464.json with huggingface_hub
437b3d5
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.9/ifeval/results_2024-03-18T15-18-41.904972.json with huggingface_hub
06c723e
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.6/gsm8k/results_2024-03-16T22-45-24.270002.json with huggingface_hub
47e1f74
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.7/gsm8k/results_2024-03-16T22-43-25.759701.json with huggingface_hub
63ae265
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.8/gsm8k/results_2024-03-16T22-43-20.193012.json with huggingface_hub
f7627c3
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.6/ifeval/results_2024-03-16T22-42-33.074718.json with huggingface_hub
2836f47
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.7/ifeval/results_2024-03-16T22-41-15.459095.json with huggingface_hub
3374a12
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.8/ifeval/results_2024-03-16T22-40-25.432296.json with huggingface_hub
d2783be
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-ift/v0.3/gsm8k/results_2024-03-14T23-26-33.789538.json with huggingface_hub
5136f11
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-ift/v0.3/ifeval/results_2024-03-14T23-22-03.186605.json with huggingface_hub
4bf4caa
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-dpo/v0.2.7/gsm8k/results_2024-03-14T22-48-09.910141.json with huggingface_hub
d67ef71
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-dpo/v0.2.7/ifeval/results_2024-03-14T22-41-25.248142.json with huggingface_hub
781f88b
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-dpo/v0.2.6/gsm8k/results_2024-03-14T20-44-20.773569.json with huggingface_hub
b60d877
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-dpo/v0.2.6/ifeval/results_2024-03-14T20-36-49.238793.json with huggingface_hub
46ddb36
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-ift/v0.2/gsm8k/results_2024-03-14T18-39-43.255609.json with huggingface_hub
dd32f1c
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-ift/v0.2/ifeval/results_2024-03-14T18-32-27.610398.json with huggingface_hub
71c0675
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-dpo/v0.2.5/gsm8k/results_2024-03-14T18-10-48.076806.json with huggingface_hub
e02003b
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/zephyr-7b-beta-dpo/v0.2.5/ifeval/results_2024-03-14T18-01-32.182660.json with huggingface_hub
8f343ec
verified

lewtun HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.3/mmlu/results_2024-03-14T12-37-10.426455.json with huggingface_hub
3bf1be5
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.2/mmlu/results_2024-03-14T12-37-03.903529.json with huggingface_hub
ad5695c
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.1/mmlu/results_2024-03-14T12-36-53.798690.json with huggingface_hub
f99fd3d
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.0/mmlu/results_2024-03-14T12-36-38.178613.json with huggingface_hub
437d3fa
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.3/gsm8k/results_2024-03-14T12-37-21.372718.json with huggingface_hub
d487bc1
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.2/gsm8k/results_2024-03-14T12-36-53.394926.json with huggingface_hub
52947ec
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.1/gsm8k/results_2024-03-14T12-36-34.373733.json with huggingface_hub
83ca56e
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.0/gsm8k/results_2024-03-14T12-36-02.642572.json with huggingface_hub
6195b1a
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.1/ifeval/results_2024-03-14T12-35-23.992867.json with huggingface_hub
762554e
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.2/ifeval/results_2024-03-14T12-35-00.673167.json with huggingface_hub
75550f7
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.3/ifeval/results_2024-03-14T12-34-41.592801.json with huggingface_hub
0bb5e21
verified

edbeeching HF staff commited on

Upload eval_results/HuggingFaceH4/qwen-1.5-1.8b-dpo/v0.0/ifeval/results_2024-03-14T12-34-18.153949.json with huggingface_hub
44af62b
verified

edbeeching HF staff commited on