Update README.md
Browse files
README.md
CHANGED
@@ -49,9 +49,9 @@ model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SambaLingo-Turkis
|
|
49 |
## Training Details
|
50 |
The alignment phase follows the recipe for [Zephyr-7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), and comprises two stages: supervised fine-tuning (SFT) and Direct Performance Optimization (DPO).
|
51 |
|
52 |
-
The SFT phase was done on the ultrachat_200k dataset mixed with the Google translated version of the ultrachat_200k dataset. It was trained for one epoch with global batch size 512 and max sequence length 2048 tokens. We used a linear decay learning rate of 2e-5 and 10% warmup.
|
53 |
|
54 |
-
The DPO phase was done on the ultrafeedback dataset and cai-conversation-harmless dataset, mixed with 10% of the data Google translated. It was trained with global batch size 32 and for three epochs. We used a linear decay learning rate of 5e-7, 10% warmup and β=0.1 as the regularization factor for DPO.
|
55 |
|
56 |
|
57 |
## Tokenizer Details
|
|
|
49 |
## Training Details
|
50 |
The alignment phase follows the recipe for [Zephyr-7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), and comprises two stages: supervised fine-tuning (SFT) and Direct Performance Optimization (DPO).
|
51 |
|
52 |
+
The SFT phase was done on the [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset mixed with the Google translated version of the ultrachat_200k dataset. It was trained for one epoch with global batch size 512 and max sequence length 2048 tokens. We used a linear decay learning rate of 2e-5 and 10% warmup.
|
53 |
|
54 |
+
The DPO phase was done on the [ultrafeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset and cai-conversation-harmless dataset, mixed with 10% of the data Google translated. It was trained with global batch size 32 and for three epochs. We used a linear decay learning rate of 5e-7, 10% warmup and β=0.1 as the regularization factor for DPO.
|
55 |
|
56 |
|
57 |
## Tokenizer Details
|