Update README.md
Browse files
README.md
CHANGED
@@ -22,17 +22,12 @@ Evaluation Results
|
|
22 |
|
23 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
• ARC Challenge: The model performs decently in answering general knowledge questions.
|
33 |
-
• HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|
34 |
-
• PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
|
35 |
-
• Winogrande: It also shows competitive performance in linguistic reasoning tasks.
|
36 |
|
37 |
Ethical Considerations
|
38 |
|
|
|
22 |
|
23 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
24 |
|
25 |
+

|
26 |
+
|
27 |
+
- ARC Challenge: The model performs decently in answering general knowledge questions.
|
28 |
+
- HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|
29 |
+
- PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
|
30 |
+
- Winogrande: It also shows competitive performance in linguistic reasoning tasks.
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
Ethical Considerations
|
33 |
|