bfuzzy1
/

acheron

bfuzzy1 commited on Dec 18, 2024

Commit

c869e20

verified ·

1 Parent(s): a9fd896

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -22,17 +22,12 @@ Evaluation Results
 The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
-| Parameters | Model     | MMLU  | ARC-C  | HellaSwag | PIQA   | Winogrande | Average |
-|------------|-----------|-------|--------|-----------|--------|------------|---------|
-| 500M       | qwen 2    | 44.13 | 28.92  | 49.05     | 69.31  | 56.99      | 49.68   |
-| 500M       | qwen 2.5  | 47.29 | 31.83  | 52.17     | 70.29  | 57.06      | 51.72   |
-| 1.24B      | llama 3.2 | 36.75 | 36.18  | 63.70     | 74.54  | 60.54      | 54.34   |
-| 514M       | archeon   | NA    | 32.34  | 47.80     | 74.37  | 62.12      | 54.16   |
-•	ARC Challenge: The model performs decently in answering general knowledge questions.
-•	HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
-•	PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
-•	Winogrande: It also shows competitive performance in linguistic reasoning tasks.
 Ethical Considerations

 The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65afe3fd7c11edbf6e1a1277/UjQV3U0cu2vzPWFK2LYjC.png)
+- ARC Challenge: The model performs decently in answering general knowledge questions.
+- HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
+- PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
+- Winogrande: It also shows competitive performance in linguistic reasoning tasks.
 Ethical Considerations