Update README.md
Browse files
README.md
CHANGED
@@ -25,13 +25,13 @@ This instruction model was built via parameter-efficient QLoRA finetuning of [Ll
|
|
25 |
|
26 |
| Metric | Value |
|
27 |
|-----------------------|-------|
|
28 |
-
| MMLU (5-shot) |
|
29 |
-
| ARC (25-shot) |
|
30 |
-
| HellaSwag (10-shot) |
|
31 |
-
| TruthfulQA (0-shot) |
|
32 |
-
| Avg. |
|
33 |
|
34 |
-
We use
|
35 |
|
36 |
## Helpful links
|
37 |
|
|
|
25 |
|
26 |
| Metric | Value |
|
27 |
|-----------------------|-------|
|
28 |
+
| MMLU (5-shot) | 46.63 |
|
29 |
+
| ARC (25-shot) | 51.19 |
|
30 |
+
| HellaSwag (10-shot) | 78.92 |
|
31 |
+
| TruthfulQA (0-shot) | 48.5 |
|
32 |
+
| Avg. | 56.31 |
|
33 |
|
34 |
+
We use the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
35 |
|
36 |
## Helpful links
|
37 |
|