daekeun-ml commited on
Commit
84ac753
·
verified ·
1 Parent(s): 7647fcf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -16,6 +16,7 @@ This model is a model that performed continued pre-training and fine-tuning (ins
16
 
17
  ### DUS(Depth Up-Scaling) and continued pre-training
18
  Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 `ml.g5.48xlarge` instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
 
19
  For distributed training, all weights were trained without adapter techniques, and sharding parallelization was performed with ZeRO-2. The presets are as follows.
20
 
21
  ```json
 
16
 
17
  ### DUS(Depth Up-Scaling) and continued pre-training
18
  Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 `ml.g5.48xlarge` instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
19
+ Note that performance is not guaranteed since only a small number of datasets were used for the experiment. The number of samples for training set is just around 1.5 million after tokenization.
20
  For distributed training, all weights were trained without adapter techniques, and sharding parallelization was performed with ZeRO-2. The presets are as follows.
21
 
22
  ```json