LLaMAX
/

LLaMAX2-7B-MetaMath

Text Generation

Transformers

Safetensors

llama

Multilingual

text-generation-inference

Model card Files Files and versions Community

huangzixian commited on Jul 9, 2024

Commit

29d7d78

1 Parent(s): f896060

update readme

Browse files

Files changed (1) hide show

README.md +13 -16

README.md CHANGED Viewed

@@ -1,16 +1,16 @@
 ### Model Sources
-**Paper**: LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
-Link: https://arxiv.org/pdf/2407
 ### Model Description
-🔥 LLaMAX-7B-MetaMath is fully fine-tuned on the MetaMathQA dataset based on the powerful multilingual model LLaMAX-7B.
-🔥 Compared with the [MetaMath-7B](https://huggingface.co/meta-math/MetaMath-7B-V1.0), LLaMAX-7B-MetaMath performs significantly better in mathematical reasoning in low-resource languages, improving the average accuracy of low-resource languages on MGSM dataset by up to 18.8%.
-🔥 LLaMAX-7B-MetaMath demonstrates good multilingual math reasoning capability in all languages, improving the average accuracy by 6.2% across all languages in MGSM dataset.
 ### Model Usage
@@ -46,20 +46,17 @@ the total number of words (1050) by the number of days in two weeks (14). So, th
 1050/14 = 75 words in each daily crossword puzzle on average. #### The answer is: 75“
 ```
 ### Experiments
-We evaluated LLaMAX-7B-MetaMath on the MGSM dataset. Compared with MetaMath-7B, LLaMAX-7B-MetaMath achieves a leading on both high-resource languages (Hrl.) and low-resource languages (Lrl.).
-| MGSM                        | Bn    | Th   | Sw | Ja    | Zh   | De | Fr | Ru   | Es | En | Lrl. | Hrl. | Avg.   |
-|-----------------------------|-------|------|----|-------|------|----|----|------|----|----|------|------|--------|
-| MetaMath-7B (official)   | 	6.8	 | 7.2  |6.8| 36.4  | 38.4 | 55.2|54.4| 52.0 |57.2|68.8| 6.9  | 51.8 | 38.32  |
-| MetaMath-7B (Reproduced) | 6.0   | 10.0 |4.4|36.4|42.8|52.8|56.0|48.8|58.8|64.8| 6.8  | 51.5 | 38.08  |
-| LLaMAX-7B-MetaMath     |26.8| 24.0 |26.0|35.6|42.4|56.8|55.2|53.6|56.8|65.6| 25.6 | 52.3 |  44.28 |
 ### Citation
 if our model helps your work, please cite this paper:
 ```
-@inproceedings{Huang2024MindMergerEB,
-  title={XLLaMA2: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages},
-  year={2024},
-}
 ```

 ### Model Sources
+- **Paper**: LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
+- **Link**:
+- **Repository**: https://github.com/CONE-MT/LLaMAX/
 ### Model Description
+🔥 LLaMAX2-7B-MetaMath is fully fine-tuned on the MetaMathQA dataset based on the powerful multilingual model LLaMAX2-7B.
+🔥 Compared with the [MetaMath-7B](https://huggingface.co/meta-math/MetaMath-7B-V1.0), LLaMAX2-7B-MetaMath performs significantly better in mathematical reasoning in low-resource languages, improving the average accuracy of low-resource languages on MGSM dataset by up to 18.8%.
+🔥 LLaMAX2-7B-MetaMath demonstrates good multilingual math reasoning capability in all languages, improving the average accuracy by 6.2% across all languages in MGSM dataset.
 ### Model Usage
 1050/14 = 75 words in each daily crossword puzzle on average. #### The answer is: 75“
 ```
 ### Experiments
+We evaluated LLaMAX2-7B-MetaMath on the MGSM dataset. Compared with MetaMath-7B, LLaMAX-7B-MetaMath achieves a leading on both high-resource languages (Hrl.) and low-resource languages (Lrl.).
+| MGSM                      | Avg.    | Lrl. | Hrl.   | Bn     | Th   | Sw | Ja    | Zh   | De | Fr | Ru   | Es | En |
+|---------------------------|---------|------|--------|--------|------|----|----|------|----|----|------|------|--------|
+| MetaMath-7B (official)    | 38.32   | 6.9  | 51.8   | 6.8	   | 7.2  |6.8| 36.4 | 38.4 | 55.2|54.4| 52.0 |57.2|68.8|
+| MetaMath-7B (Reproduced)  | 38.08   | 6.8  | 51.5   | 6.0    | 10.0 |4.4| 36.4 |42.8|52.8|56.0|48.8|58.8|64.8|
+| LLaMAX2-7B-MetaMath       | 44.28   | 25.6 | 52.3   | 26.8   | 24.0 |26.0| 35.6 |42.4|56.8|55.2|53.6|56.8|65.6|
 ### Citation
 if our model helps your work, please cite this paper:
 ```
 ```