Text Generation
Transformers
Safetensors
Turkish
English
llama
conversational
text-generation-inference
Inference Endpoints
zolicsaki commited on
Commit
863fd8c
·
verified ·
1 Parent(s): 5086cc9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -49,9 +49,9 @@ model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SambaLingo-Turkis
49
  ## Training Details
50
  The alignment phase follows the recipe for [Zephyr-7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), and comprises two stages: supervised fine-tuning (SFT) and Direct Performance Optimization (DPO).
51
 
52
- The SFT phase was done on the ultrachat_200k dataset mixed with the Google translated version of the ultrachat_200k dataset. It was trained for one epoch with global batch size 512 and max sequence length 2048 tokens. We used a linear decay learning rate of 2e-5 and 10% warmup.
53
 
54
- The DPO phase was done on the ultrafeedback dataset and cai-conversation-harmless dataset, mixed with 10% of the data Google translated. It was trained with global batch size 32 and for three epochs. We used a linear decay learning rate of 5e-7, 10% warmup and β=0.1 as the regularization factor for DPO.
55
 
56
 
57
  ## Tokenizer Details
 
49
  ## Training Details
50
  The alignment phase follows the recipe for [Zephyr-7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), and comprises two stages: supervised fine-tuning (SFT) and Direct Performance Optimization (DPO).
51
 
52
+ The SFT phase was done on the [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset mixed with the Google translated version of the ultrachat_200k dataset. It was trained for one epoch with global batch size 512 and max sequence length 2048 tokens. We used a linear decay learning rate of 2e-5 and 10% warmup.
53
 
54
+ The DPO phase was done on the [ultrafeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset and cai-conversation-harmless dataset, mixed with 10% of the data Google translated. It was trained with global batch size 32 and for three epochs. We used a linear decay learning rate of 5e-7, 10% warmup and β=0.1 as the regularization factor for DPO.
55
 
56
 
57
  ## Tokenizer Details