reproduce #aime score

#5
by sqatwork - opened

i finetune qwen2.5-32b-instruct model with your released dataset and has exactly same settings with you. but i got lower aime score. my final results are: aime 0.5 ( temperatures = 0.7) ; aime 0.56 ( temperatures = 0.3) ; aime 0.4 ( temperatures = 0.1); here is my training loss curves,
W&B Chart 2025_2_7 23_50_16.png
can you help me with this?

some difference between our loss curves.
lr_compare.png

loss_compare.png

This comment has been hidden

this is my training recipe:
$NSYS_PROFILE_ARGS torchrun $DISTRIBUTED_ARGS src/train.py
--deepspeed examples/deepspeed/ds_z3_offload_config.json
--stage sft
--do_train
--model_name_or_path $load_dir
--dataset $dataset
--template qwen25
--finetuning_type full
--output_dir $output_dir
--overwrite_cache
--preprocessing_num_workers 16
--max_samples 1000000
--warmup_ratio 0.1
--weight_decay 0.1
--per_device_train_batch_size 1
--gradient_accumulation_steps 12
--ddp_timeout 180000000
--learning_rate 1e-05
--lr_scheduler_type cosine
--logging_steps 1
--cutoff_len 16384
--save_steps $save_iter
--plot_loss
--num_train_epochs $num_train_epochs
--bf16
--report_to wandb
--run_name $run_name
--optim adamw_torch
--adam_beta1 0.9
--adam_beta2 0.999
--adam_epsilon 1e-8;

this difference in loss caused by transformer versions. I updated transformers from 4.45.2 --> 4.46.1; the loss curves are same with you.

Sign up or log in to comment