Update README.md
Browse files
README.md
CHANGED
@@ -119,7 +119,7 @@ The models have been pre-trained using a blend of the following datasets.
|
|
119 |
### Mid-training
|
120 |
|
121 |
In the LLM-jp-3.1 series, we performed continued pre-training based on [Instruction Pre-Training](https://aclanthology.org/2024.emnlp-main.148/).
|
122 |
-
Instruction Pre-Training
|
123 |
We prepared approximately 90B tokens of instruction–response data and mixed it with our pre-training datasets, conducting continued pre-training on a total of 400B tokens.
|
124 |
Each model was initialized from existing checkpoints ([llm-jp/llm-jp-3-1.8b](https://huggingface.co/llm-jp/llm-jp-3-1.8b), [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b), and [llm-jp/llm-jp-3-8x13b](https://huggingface.co/llm-jp/llm-jp-3-8x13b)) and underwent continued instruction pre-training.
|
125 |
Since the LLM-jp-3 series was originally pre-trained on 2.1T tokens, the total pre-training token count amounts to 2.5T tokens.
|
|
|
119 |
### Mid-training
|
120 |
|
121 |
In the LLM-jp-3.1 series, we performed continued pre-training based on [Instruction Pre-Training](https://aclanthology.org/2024.emnlp-main.148/).
|
122 |
+
Instruction Pre-Training enhances a model’s ability to follow instructions by continuing pre-training on a large collection of instruction–response pairs.
|
123 |
We prepared approximately 90B tokens of instruction–response data and mixed it with our pre-training datasets, conducting continued pre-training on a total of 400B tokens.
|
124 |
Each model was initialized from existing checkpoints ([llm-jp/llm-jp-3-1.8b](https://huggingface.co/llm-jp/llm-jp-3-1.8b), [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b), and [llm-jp/llm-jp-3-8x13b](https://huggingface.co/llm-jp/llm-jp-3-8x13b)) and underwent continued instruction pre-training.
|
125 |
Since the LLM-jp-3 series was originally pre-trained on 2.1T tokens, the total pre-training token count amounts to 2.5T tokens.
|