bfuzzy1 commited on
Commit
c869e20
·
verified ·
1 Parent(s): a9fd896

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -11
README.md CHANGED
@@ -22,17 +22,12 @@ Evaluation Results
22
 
23
  The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
24
 
25
- | Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average |
26
- |------------|-----------|-------|--------|-----------|--------|------------|---------|
27
- | 500M | qwen 2 | 44.13 | 28.92 | 49.05 | 69.31 | 56.99 | 49.68 |
28
- | 500M | qwen 2.5 | 47.29 | 31.83 | 52.17 | 70.29 | 57.06 | 51.72 |
29
- | 1.24B | llama 3.2 | 36.75 | 36.18 | 63.70 | 74.54 | 60.54 | 54.34 |
30
- | 514M | archeon | NA | 32.34 | 47.80 | 74.37 | 62.12 | 54.16 |
31
-
32
- • ARC Challenge: The model performs decently in answering general knowledge questions.
33
- • HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
34
- • PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
35
- • Winogrande: It also shows competitive performance in linguistic reasoning tasks.
36
 
37
  Ethical Considerations
38
 
 
22
 
23
  The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
24
 
25
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65afe3fd7c11edbf6e1a1277/UjQV3U0cu2vzPWFK2LYjC.png)
26
+
27
+ - ARC Challenge: The model performs decently in answering general knowledge questions.
28
+ - HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
29
+ - PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
30
+ - Winogrande: It also shows competitive performance in linguistic reasoning tasks.
 
 
 
 
 
31
 
32
  Ethical Considerations
33