Update README.md
Browse files
README.md
CHANGED
@@ -46,6 +46,12 @@ model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SambaLingo-Turkis
|
|
46 |
## Evaluation Results
|
47 |
|
48 |
## Training Details
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
## Tokenizer Details
|
51 |
We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
|
|
|
46 |
## Evaluation Results
|
47 |
|
48 |
## Training Details
|
49 |
+
The alignment phase follows the recipe for [Zephyr-7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), and comprises two stages: supervised fine-tuning (SFT) and Direct Performance Optimization (DPO).
|
50 |
+
|
51 |
+
The SFT phase was done on the ultrachat_200k dataset mixed with the Google translated version of the ultrachat_200k dataset. It was trained for one epoch with global batch size 512 and max sequence length 2048 tokens. We used a linear decay learning rate of 2e-5 and 10% warmup.
|
52 |
+
|
53 |
+
The DPO phase was done on the ultrafeedback dataset and cai-conversation-harmless dataset, mixed with 10% of the data Google translated. It was trained with global batch size 32 and for three epochs. We used a linear decay learning rate of 5e-7, 10% warmup and β=0.1 as the regularization factor for DPO.
|
54 |
+
|
55 |
|
56 |
## Tokenizer Details
|
57 |
We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
|