Text Generation
Transformers
PyTorch
llama
text-generation-inference
Parallel_xRFT_7B / README.md
compasszzn's picture
Create README.md
df603ea
|
raw
history blame
9.48 kB
metadata
license: apache-2.0
datasets:
  - Mathoctopus/GSM8KInstruct_Parallel
language:
  - en
  - es
  - zh
  - de
  - ru
  - th
  - sw
  - ja
  - fr
  - bn

Introduction

We introduce πŸ™ MathOctopus, a series of open-source large language models (LLMs) specifically tailored for multilingual math problem-solving. The MathOctopus models are trained on πŸ€— MGSM8KInstruct Dataset, encompassing ten distinct languages. MathOctopus notably outperforms conventional open-source LLMs and exhibits superiority over ChatGPT in few-shot scenarios.

Datasets

MGSM8KInstruct

Training Dataset En Sw Zh Bn De Es Fr Ja Ru Th Overall
MGSM8KInstruct 7473 7472 7466 6539 7466 7470 7469 7471 7361 7473 73.6K

MSVAMP

Test Dataset En Sw Zh Bn De Es Fr Ja Ru Th Overall
MSVAMP 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 10K

Usage

Our dataset and models are all available at Huggingface.

πŸ€— MGSM8KInstruct_Parallel Dataset

πŸ€— MSVAMP Dataset

Or you can directly download them from

Models

Base Model: LLama Parallel-Training Cross-Training
7B-LLaMA 2 πŸ™ MathOctopus-Parallel-7B πŸ™ MathOctopus-Cross-7B
πŸ™MathOctopus-Parallel-xRFT-7B πŸ™MathOctopus-Cross-xRFT-7B
13B-LLaMA 2 πŸ™ [MathOctopus-Parallel-13B] πŸ™ [MathOctopus-Cross-13B]
πŸ™MathOctopus-Parallel-xRFT-13B πŸ™[MathOctopus-Cross-xRFT-13B]
33B-LLaMA 1 πŸ™ [MathOctopus-Parallel-33B] πŸ™ [MathOctopus-Cross-33B]
70B-LLaMA 2 Coming soon! Coming Soon!

*-Parallel refers to our model trained with the parallel-training strategy. *-Cross refers to our model trained with cross-training strategy.

*-xRFT means we train the model with multilingual rejection sampling.

Overall Results on MGSM

7B Model En Sw Zh Bn De Es Fr Ja Ru Th Overall
MathOctuposC 52.0 23.6 31.6 18.8 38.0 39.2 36.4 27.2 33.6 21.6 32.2
xRFT-MathOctuposC 51.2 24.0 33.2 18.8 36.0 41.2 37.6 29.6 36.4 25.2 33.3
MathOctuposP-LoRA 30.4 15.2 23.6 10.4 22.8 24.8 26.4 18.0 22.0 14.8 20.8
MathOctuposP 52.4 39.2 38.4 28.8 44.8 42.4 43.6 36.0 39.6 34.4 40.0
xRFT-MathOctuposP 54.8 38.4 45.2 33.2 43.6 45.2 38.0 35.6 48.4 36.4 41.9

| 13B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall | |:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------| | MathOctuposC | 56.4 | 27.2 | 39.2 | 24.0 | 47.6 | 49.6 | 47.6 | 40.4 | 42.0 | 24.8 | 39.9 | | **xRFT**-MathOctuposC| 53.6 | 28.0 | 45.2 | 21.2 | 48.0 | 46.4 | 46.0 | 35.2 | 45.6 | 28.8 | 39.8 | | MathOctuposP | 53.2 | 42.8 | 48.8 | 35.2 | 44.4 | 48.0 | 48.4 | 43.2 | 47.6 | 46.8 | 45.8 | | **xRFT**-MathOctuposP| 51.6 | 46.0 | 51.2 | 42.0 | 49.2 | 53.2 | 49.6 | 39.6 | 47.6 | 46.0 | 47.6 |

| 30-34B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall | |:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------| | MathOctuposC | 55.6 | 24.4 | 36.0 | 19.2 | 40.4 | 51.2 | 44.4 | 27.2 | 37.2 | 21.6 | 35.7 | | **xRFT**-MathOctuposC| 53.6 | 27.6 | 34.4 | 19.2 | 47.2 | 47.6 | 44.8 | 30.8 | 38.8 | 22.8 | 36.7 | | MathOctuposP | 56.4 | 46.8 | 52.0 | 35.2 | 47.2 | 53.2 | 48.0 | 39.2 | 45.6 | 41.2 | 46.5 | | **xRFT**-MathOctuposP| 51.6 | 47.2 | 52.4 | 37.6 | 51.2 | 52.8 | 44.4 | 41.6 | 50.0 | 47.6 | 47.6 | ### **Overall Results on MSVAMP** | 7B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall | |:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------| | MathOctuposC | 49.2 | 36.6 | 43.6 | 30.2 | 48.6 | 46.8 | 46.4 | 42.5 | 46.7 | 34.0 | 42.5 | | **xRFT**-MathOctuposC| 49.9 | 37.7 | 43.3 | 32.9 | 46.5 | 47.6 | 47.3 | 42.7 | 46.6 | 36.2 | 43.1 | | MathOctuposP-LoRA | 30.4 | 15.2 | 23.6 | 10.4 | 22.8 | 24.8 | 26.4 | 18.0 | 22.0 | 14.8 | 20.8 | | MathOctuposP | 46.5 | 40.1 | 42.5 | 29.1 | 43.5 | 45.4 | 46.0 | 42.5 | 45.4 | 35.7 | 41.7 | | **xRFT**-MathOctuposP| 46.8 | 42.3 | 43.2 | 32.8 | 43.1 | 44.5 | 45.3 | 43.2 | 42.1 | 40.5 | 42.4 |

| 13B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall | |:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------| | MathOctuposC | 56.6 | 40.4 | 49.0 | 30.3 | 50.9 | 54.2 | 54.7 | 46.3 | 52.4 | 35.7 | 47.1 | | **xRFT**-MathOctuposC| 52.9 | 41.9 | 49.2 | 34.1 | 50.5 | 52.8 | 51.5 | 45.8 | 50.2 | 35.7 | 46.5 | | MathOctuposP | 50.7 | 43.4 | 42.6 | 31.8 | 48.4 | 49.4 | 50.6 | 41.1 | 46.9 | 39.3 | 44.4 | | **xRFT**-MathOctuposP| 44.6 | 43.4 | 46.4 | 34.2 | 47.7 | 48.2 | 49.9 | 43.1 | 48.2 | 39.5 | 44.5 |

| 30-34B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall | |:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------| | MathOctuposC | 51.5 | 42.1 | 46.2 | 23.2 | 50.5 | 52.1 | 52.9 | 42.2 | 50.5 | 33.4 | 44.5 | | **xRFT**-MathOctuposC| 48.1 | 42.8 | 43.6 | 23.3 | 48.7 | 50.0 | 48.9 | 43.4 | 44.6 | 35.5 | 42.9 | | MathOctuposP | 56.4 | 46.8 | 52.0 | 35.2 | 47.2 | 53.2 | 48.0 | 39.2 | 45.6 | 41.2 | 46.5 | | **xRFT**-MathOctuposP| 48.0 | 42.3 | 46.1 | 36.2 | 47.5 | 48.5 | 48.3 | 45.8 | 47.2 | 41.2 | 45.1 | ### **MathOctupos in English** | Models | GSM8K | SVAMP | |:--------------------------------|:--------|:--------| | LLaMA 2-7B | 42.4 | 38.3 | | MathOctuposP-7B | 49.3 | 46.8 | | MathOctuposC-7B | 50.8 | 49.3 | | LLaMA 2-13B | 51.0 | 50.9 | | MathOctuposP-13B | 55.5 | 52.1 | | MathOctuposC-13B | 56.6 | 56.6 | | LLaMA 1-33B | 50.0 | 49.0 | | MathOctuposP-33B | 56.0 | 52.5 | | MathOctuposC-33B | 53.7 | 51.5 | ## Intended Uses These models are trained for research purposes. They are designed to solve multilingual math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed.