Text Generation
Transformers
Safetensors
llama
alignment-handbook
trl
dpo
Generated from Trainer
conversational
text-generation-inference
Inference Endpoints
flydust's picture
Model save
8f435e0 verified
|
raw
history blame
5.03 kB
metadata
library_name: transformers
license: other
base_model: flydust/Llama-3.1-Minitron-4B-Magpie-Gemma2-9B-550K
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Llama-3.1-Minitron-4B-Magpie-SFT-G550K-MT-Magpo-3.1-Pro-015Mix
    results: []

Llama-3.1-Minitron-4B-Magpie-SFT-G550K-MT-Magpo-3.1-Pro-015Mix

This model is a fine-tuned version of flydust/Llama-3.1-Minitron-4B-Magpie-Gemma2-9B-550K on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4942
  • Rewards/chosen: -3.3377
  • Rewards/rejected: -4.2603
  • Rewards/accuracies: 0.7620
  • Rewards/margins: 0.9226
  • Logps/rejected: -928.2655
  • Logps/chosen: -844.1144
  • Logits/rejected: -1.4591
  • Logits/chosen: -1.4783

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6911 0.0653 100 0.6912 -0.0026 -0.0066 0.5640 0.0041 -502.9037 -510.6042 -1.7834 -1.7781
0.6703 0.1306 200 0.6713 -0.1429 -0.1981 0.6380 0.0552 -522.0521 -524.6394 -1.7686 -1.7593
0.6306 0.1959 300 0.6347 -0.6439 -0.8210 0.6840 0.1770 -584.3356 -574.7375 -1.7536 -1.7436
0.5831 0.2612 400 0.5932 -1.5155 -1.8774 0.7070 0.3619 -689.9788 -661.8920 -1.6963 -1.6877
0.5447 0.3266 500 0.5645 -2.1858 -2.7052 0.7110 0.5195 -772.7636 -728.9221 -1.6249 -1.6207
0.5896 0.3919 600 0.5453 -2.3771 -2.9747 0.7180 0.5976 -799.7122 -748.0584 -1.5836 -1.5847
0.5342 0.4572 700 0.5305 -2.6231 -3.3063 0.7350 0.6832 -832.8744 -772.6592 -1.5454 -1.5524
0.511 0.5225 800 0.5177 -3.0517 -3.8393 0.7400 0.7876 -886.1714 -815.5145 -1.5160 -1.5273
0.5007 0.5878 900 0.5088 -3.0925 -3.9197 0.7540 0.8273 -894.2120 -819.5908 -1.5007 -1.5144
0.485 0.6531 1000 0.5033 -3.1305 -3.9863 0.7630 0.8558 -900.8680 -823.3940 -1.4834 -1.4997
0.4307 0.7184 1100 0.4989 -3.1387 -4.0097 0.7610 0.8710 -903.2113 -824.2159 -1.4728 -1.4911
0.5403 0.7837 1200 0.4964 -3.3418 -4.2574 0.7620 0.9156 -927.9747 -844.5242 -1.4641 -1.4822
0.5182 0.8490 1300 0.4952 -3.3255 -4.2430 0.7600 0.9175 -926.5396 -842.8945 -1.4601 -1.4788
0.5165 0.9144 1400 0.4943 -3.3308 -4.2525 0.7600 0.9217 -927.4913 -843.4282 -1.4610 -1.4799
0.5192 0.9797 1500 0.4942 -3.3377 -4.2603 0.7620 0.9226 -928.2655 -844.1144 -1.4591 -1.4783

Framework versions

  • Transformers 4.45.0.dev0
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1