Update README.md
Browse files
README.md
CHANGED
@@ -91,9 +91,7 @@ The plot below highlights the alignment comparison of the model trained with Con
|
|
91 |
The table below summarizes the evaluation results across mathematical tasks and original capabilities for various models and training approaches.
|
92 |
|
93 |
| **Model** | **Math Tasks** | | | | **Original Capabilities** | | | | **Overall Avg.** |
|
94 |
-
|--------------------------|----------------------------|----------|-----------|----------|-----------------------------|---------|---------|-----------|------------------|
|
95 |
| | **MathHard** | **Math** | **GSM8K** | **Avg.** | **ARC** | **GPQA**| **MMLU**| **MMLUP** | |
|
96 |
-
|--------------------------|----------------------------|----------|-----------|----------|-----------------------------|---------|---------|-----------|------------------|
|
97 |
| Llama3.1-8B-Instruct | 23.7 | 50.9 | 85.6 | 52.1 | 83.4 | 29.9 | 72.4 | 46.7 | 56.3 |
|
98 |
| OpenMath2-Llama3.1 | 38.4 | 64.1 | 90.3 | 64.3 | 45.8 | 1.3 | 4.5 | 19.5 | 38.6 |
|
99 |
| **Full Param Tune** | **38.5** | **63.7** | 90.2 | **63.9** | 58.2 | 1.1 | 7.3 | 23.5 | 40.1 |
|
|
|
91 |
The table below summarizes the evaluation results across mathematical tasks and original capabilities for various models and training approaches.
|
92 |
|
93 |
| **Model** | **Math Tasks** | | | | **Original Capabilities** | | | | **Overall Avg.** |
|
|
|
94 |
| | **MathHard** | **Math** | **GSM8K** | **Avg.** | **ARC** | **GPQA**| **MMLU**| **MMLUP** | |
|
|
|
95 |
| Llama3.1-8B-Instruct | 23.7 | 50.9 | 85.6 | 52.1 | 83.4 | 29.9 | 72.4 | 46.7 | 56.3 |
|
96 |
| OpenMath2-Llama3.1 | 38.4 | 64.1 | 90.3 | 64.3 | 45.8 | 1.3 | 4.5 | 19.5 | 38.6 |
|
97 |
| **Full Param Tune** | **38.5** | **63.7** | 90.2 | **63.9** | 58.2 | 1.1 | 7.3 | 23.5 | 40.1 |
|