i want to reproduce the result, but encounter some inconsistency with your training curve
i load qwen2.5-32b-instruct model, and use this dataset settings to train
"Sky-T1": {
"hf_hub_url": "NovaSky-AI/Sky-T1_data_17k",
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"system": "system"
},
"tags": {
"role_tag": "from",
"content_tag": "value",
"user_tag": "user",
"assistant_tag": "assistant"
}
},
other training settings are exactly the same as you reported. But my training loss starts around 0.5, the training curves are significantly different from you reported. Here is my wandb log : https://wandb.ai/shuqiatwork-minimax/huggingface?nw=nwusershuqiatwork.
Can you help me identify the problem, thanks a lot
Are you comparing to the loss curve here?
https://huggingface.co/bespokelabs/Bespoke-Stratos-32B/blob/main/training_loss.png
It looks like you have closed the issue, so I'm going to assume that you resolved it :)