daekeun-ml
/

phi-2-upscaled-4B-instruct-v0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

daekeun-ml commited on Jan 31, 2024

Commit

824c899

·

verified ·

1 Parent(s): 929b4d1

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -15,10 +15,10 @@ language:
 This model is a model that performed continued pre-training and fine-tuning (instruction tuning) using the depth up-scaling (DUS) technique disclosed by Upstage.
 ### DUS(Depth Up-Scaling) and continued pre-training
-Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 ml.g5.48xlarge instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
 ### Fine-tuning
-After performing pre-training, instruction tuning and alignment tuning were performed sequentially. This process only took about 10 hours using ml.g5.24xlarge (NVIDIA A10G GPU x 4ea). The dataset used for instruction tuning is a sample set of the OpenOrca dataset, and the dataset used for alignment tuning is Intel's orca_dpo_pairs dataset.
 ### References
 - Base model: [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)

 This model is a model that performed continued pre-training and fine-tuning (instruction tuning) using the depth up-scaling (DUS) technique disclosed by Upstage.
 ### DUS(Depth Up-Scaling) and continued pre-training
+Similar to the methodology disclosed in the paper, we expanded from 32 transformer blocks to 48 blocks and then continued pre-training with the public dataset. Pre-training was performed for 3 days using 4 `ml.g5.48xlarge` instances from AWS (NVIDIA A10G GPU x 32ea). For pre-training, we used a sample set from Wikipedia.
 ### Fine-tuning
+After performing pre-training, instruction tuning and alignment tuning were performed sequentially. This process only took about 10 hours using AWS `ml.g5.24xlarge` (NVIDIA A10G GPU x 4ea). The dataset used for instruction tuning is a sample set of the OpenOrca dataset, and the dataset used for alignment tuning is Intel's orca_dpo_pairs dataset.
 ### References
 - Base model: [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)