Update README.md
Browse files
README.md
CHANGED
@@ -15,10 +15,10 @@ language:
|
|
15 |
This model is a model that performed continued pre-training and fine-tuning (instruction tuning) using the depth up-scaling (DUS) technique disclosed by Upstage.
|
16 |
|
17 |
### DUS(Depth Up-Scaling) and continued pre-training
|
18 |
-
Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 ml.g5.48xlarge instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
|
19 |
|
20 |
### Fine-tuning
|
21 |
-
After performing pre-training, instruction tuning and alignment tuning were performed sequentially. This process only took about 10 hours using ml.g5.24xlarge (NVIDIA A10G GPU x 4ea). The dataset used for instruction tuning is a sample set of the OpenOrca dataset, and the dataset used for alignment tuning is Intel's orca_dpo_pairs dataset.
|
22 |
|
23 |
### References
|
24 |
- Base model: [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
|
|
|
15 |
This model is a model that performed continued pre-training and fine-tuning (instruction tuning) using the depth up-scaling (DUS) technique disclosed by Upstage.
|
16 |
|
17 |
### DUS(Depth Up-Scaling) and continued pre-training
|
18 |
+
Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 `ml.g5.48xlarge` instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
|
19 |
|
20 |
### Fine-tuning
|
21 |
+
After performing pre-training, instruction tuning and alignment tuning were performed sequentially. This process only took about 10 hours using AWS `ml.g5.24xlarge` (NVIDIA A10G GPU x 4ea). The dataset used for instruction tuning is a sample set of the OpenOrca dataset, and the dataset used for alignment tuning is Intel's orca_dpo_pairs dataset.
|
22 |
|
23 |
### References
|
24 |
- Base model: [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
|