metadata

library_name: transformers
license: other
base_model: flydust/Llama-3.1-Minitron-4B-Magpie-Gemma2-9B-550K
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-3.1-Minitron-4B-Magpie-SFT-G550K-MT-Magpo-3.1-Pro-015Mix
    results: []

Llama-3.1-Minitron-4B-Magpie-SFT-G550K-MT-Magpo-3.1-Pro-015Mix

This model is a fine-tuned version of flydust/Llama-3.1-Minitron-4B-Magpie-Gemma2-9B-550K on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4942
Rewards/chosen: -3.3377
Rewards/rejected: -4.2603
Rewards/accuracies: 0.7620
Rewards/margins: 0.9226
Logps/rejected: -928.2655
Logps/chosen: -844.1144
Logits/rejected: -1.4591
Logits/chosen: -1.4783

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6911	0.0653	100	0.6912	-0.0026	-0.0066	0.5640	0.0041	-502.9037	-510.6042	-1.7834	-1.7781
0.6703	0.1306	200	0.6713	-0.1429	-0.1981	0.6380	0.0552	-522.0521	-524.6394	-1.7686	-1.7593
0.6306	0.1959	300	0.6347	-0.6439	-0.8210	0.6840	0.1770	-584.3356	-574.7375	-1.7536	-1.7436
0.5831	0.2612	400	0.5932	-1.5155	-1.8774	0.7070	0.3619	-689.9788	-661.8920	-1.6963	-1.6877
0.5447	0.3266	500	0.5645	-2.1858	-2.7052	0.7110	0.5195	-772.7636	-728.9221	-1.6249	-1.6207
0.5896	0.3919	600	0.5453	-2.3771	-2.9747	0.7180	0.5976	-799.7122	-748.0584	-1.5836	-1.5847
0.5342	0.4572	700	0.5305	-2.6231	-3.3063	0.7350	0.6832	-832.8744	-772.6592	-1.5454	-1.5524
0.511	0.5225	800	0.5177	-3.0517	-3.8393	0.7400	0.7876	-886.1714	-815.5145	-1.5160	-1.5273
0.5007	0.5878	900	0.5088	-3.0925	-3.9197	0.7540	0.8273	-894.2120	-819.5908	-1.5007	-1.5144
0.485	0.6531	1000	0.5033	-3.1305	-3.9863	0.7630	0.8558	-900.8680	-823.3940	-1.4834	-1.4997
0.4307	0.7184	1100	0.4989	-3.1387	-4.0097	0.7610	0.8710	-903.2113	-824.2159	-1.4728	-1.4911
0.5403	0.7837	1200	0.4964	-3.3418	-4.2574	0.7620	0.9156	-927.9747	-844.5242	-1.4641	-1.4822
0.5182	0.8490	1300	0.4952	-3.3255	-4.2430	0.7600	0.9175	-926.5396	-842.8945	-1.4601	-1.4788
0.5165	0.9144	1400	0.4943	-3.3308	-4.2525	0.7600	0.9217	-927.4913	-843.4282	-1.4610	-1.4799
0.5192	0.9797	1500	0.4942	-3.3377	-4.2603	0.7620	0.9226	-928.2655	-844.1144	-1.4591	-1.4783

Framework versions

Transformers 4.45.0.dev0
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1