daekeun-ml
/

phi-2-upscaled-4B-instruct-v0.1

Text Generation

text-generation-inference

Model card Files Files and versions

daekeun-ml commited on Feb 4, 2024

Commit

84ac753

·

verified ·

1 Parent(s): 7647fcf

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -16,6 +16,7 @@ This model is a model that performed continued pre-training and fine-tuning (ins
 ### DUS(Depth Up-Scaling) and continued pre-training
 Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 `ml.g5.48xlarge` instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
 For distributed training, all weights were trained without adapter techniques, and sharding parallelization was performed with ZeRO-2. The presets are as follows.
 ```json

 ### DUS(Depth Up-Scaling) and continued pre-training
 Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 `ml.g5.48xlarge` instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
+Note that performance is not guaranteed since only a small number of datasets were used for the experiment. The number of samples for training set is just around 1.5 million after tokenization.
 For distributed training, all weights were trained without adapter techniques, and sharding parallelization was performed with ZeRO-2. The presets are as follows.
 ```json