Taka008 commited on
Commit
89de9c5
·
verified ·
1 Parent(s): 95242c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -119,7 +119,7 @@ The models have been pre-trained using a blend of the following datasets.
119
  ### Mid-training
120
 
121
  In the LLM-jp-3.1 series, we performed continued pre-training based on [Instruction Pre-Training](https://aclanthology.org/2024.emnlp-main.148/).
122
- Instruction Pre-Training is a method that enhances a model’s ability to follow instructions by continuing pre-training on a large collection of instruction–response pairs.
123
  We prepared approximately 90B tokens of instruction–response data and mixed it with our pre-training datasets, conducting continued pre-training on a total of 400B tokens.
124
  Each model was initialized from existing checkpoints ([llm-jp/llm-jp-3-1.8b](https://huggingface.co/llm-jp/llm-jp-3-1.8b), [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b), and [llm-jp/llm-jp-3-8x13b](https://huggingface.co/llm-jp/llm-jp-3-8x13b)) and underwent continued instruction pre-training.
125
  Since the LLM-jp-3 series was originally pre-trained on 2.1T tokens, the total pre-training token count amounts to 2.5T tokens.
 
119
  ### Mid-training
120
 
121
  In the LLM-jp-3.1 series, we performed continued pre-training based on [Instruction Pre-Training](https://aclanthology.org/2024.emnlp-main.148/).
122
+ Instruction Pre-Training enhances a model’s ability to follow instructions by continuing pre-training on a large collection of instruction–response pairs.
123
  We prepared approximately 90B tokens of instruction–response data and mixed it with our pre-training datasets, conducting continued pre-training on a total of 400B tokens.
124
  Each model was initialized from existing checkpoints ([llm-jp/llm-jp-3-1.8b](https://huggingface.co/llm-jp/llm-jp-3-1.8b), [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b), and [llm-jp/llm-jp-3-8x13b](https://huggingface.co/llm-jp/llm-jp-3-8x13b)) and underwent continued instruction pre-training.
125
  Since the LLM-jp-3 series was originally pre-trained on 2.1T tokens, the total pre-training token count amounts to 2.5T tokens.