llm-jp
/

llm-jp-3.1-1.8b-instruct4

Text Generation

text-generation-inference

Model card Files Files and versions Community

Taka008 commited on 16 days ago

Commit

89de9c5

·

verified ·

1 Parent(s): 95242c3

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -119,7 +119,7 @@ The models have been pre-trained using a blend of the following datasets.
 ### Mid-training
 In the LLM-jp-3.1 series, we performed continued pre-training based on [Instruction Pre-Training](https://aclanthology.org/2024.emnlp-main.148/).
-Instruction Pre-Training is a method that enhances a model’s ability to follow instructions by continuing pre-training on a large collection of instruction–response pairs.
 We prepared approximately 90B tokens of instruction–response data and mixed it with our pre-training datasets, conducting continued pre-training on a total of 400B tokens.
 Each model was initialized from existing checkpoints ([llm-jp/llm-jp-3-1.8b](https://huggingface.co/llm-jp/llm-jp-3-1.8b), [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b), and [llm-jp/llm-jp-3-8x13b](https://huggingface.co/llm-jp/llm-jp-3-8x13b)) and underwent continued instruction pre-training.
 Since the LLM-jp-3 series was originally pre-trained on 2.1T tokens, the total pre-training token count amounts to 2.5T tokens.

 ### Mid-training
 In the LLM-jp-3.1 series, we performed continued pre-training based on [Instruction Pre-Training](https://aclanthology.org/2024.emnlp-main.148/).
+Instruction Pre-Training enhances a model’s ability to follow instructions by continuing pre-training on a large collection of instruction–response pairs.
 We prepared approximately 90B tokens of instruction–response data and mixed it with our pre-training datasets, conducting continued pre-training on a total of 400B tokens.
 Each model was initialized from existing checkpoints ([llm-jp/llm-jp-3-1.8b](https://huggingface.co/llm-jp/llm-jp-3-1.8b), [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b), and [llm-jp/llm-jp-3-8x13b](https://huggingface.co/llm-jp/llm-jp-3-8x13b)) and underwent continued instruction pre-training.
 Since the LLM-jp-3 series was originally pre-trained on 2.1T tokens, the total pre-training token count amounts to 2.5T tokens.