v1_1000_STEPS_5e6_rate_01_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8921
  • Rewards/chosen: -2.4988
  • Rewards/rejected: -2.4246
  • Rewards/accuracies: 0.4220
  • Rewards/margins: -0.0743
  • Logps/rejected: -41.1251
  • Logps/chosen: -40.2413
  • Logits/rejected: -3.1253
  • Logits/chosen: -3.1250

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.651 0.05 50 0.7879 -2.4907 -2.5558 0.4571 0.0651 -42.4372 -40.1597 -3.6039 -3.6039
1.9419 0.1 100 2.4676 -13.2591 -13.5207 0.4703 0.2616 -152.0861 -147.8440 -2.7117 -2.7117
1.6524 0.15 150 1.3126 -5.5971 -5.5926 0.4615 -0.0045 -72.8055 -71.2240 -2.8079 -2.8078
1.6099 0.2 200 1.2428 -5.1318 -5.0698 0.4527 -0.0620 -67.5774 -66.5706 -3.4503 -3.4503
1.1547 0.24 250 1.2233 -4.9777 -4.9084 0.4462 -0.0693 -65.9634 -65.0297 -3.5880 -3.5880
1.5207 0.29 300 1.2174 -4.9856 -4.8879 0.4330 -0.0978 -65.7582 -65.1095 -4.0576 -4.0576
1.2188 0.34 350 1.2151 -4.8922 -4.8034 0.4418 -0.0888 -64.9137 -64.1753 -3.9660 -3.9660
2.0083 0.39 400 1.2029 -4.8769 -4.7669 0.4396 -0.1100 -64.5482 -64.0222 -4.4976 -4.4976
1.8448 0.44 450 1.2058 -4.9788 -4.8705 0.4593 -0.1083 -65.5844 -65.0407 -3.8543 -3.8543
1.4687 0.49 500 1.2074 -4.8892 -4.7952 0.4396 -0.0940 -64.8317 -64.1451 -4.4715 -4.4715
1.6526 0.54 550 1.2022 -4.8909 -4.7833 0.4440 -0.1075 -64.7128 -64.1618 -4.6009 -4.6009
1.0589 0.59 600 1.1967 -4.8203 -4.7145 0.4352 -0.1058 -64.0247 -63.4561 -4.5611 -4.5611
1.6942 0.64 650 1.1933 -4.8330 -4.7203 0.4418 -0.1127 -64.0824 -63.5830 -4.6167 -4.6168
1.5352 0.68 700 1.1793 -4.8254 -4.7198 0.4462 -0.1056 -64.0778 -63.5073 -4.3657 -4.3657
0.9506 0.73 750 0.9935 -3.9278 -3.8382 0.4374 -0.0896 -55.2615 -54.5315 -3.3907 -3.3909
0.8433 0.78 800 0.9283 -3.4157 -3.4161 0.4484 0.0004 -51.0402 -49.4101 -3.1794 -3.1797
1.1375 0.83 850 0.8858 -2.4124 -2.3534 0.4352 -0.0590 -40.4137 -39.3767 -3.1266 -3.1261
0.8326 0.88 900 0.8873 -2.4751 -2.4084 0.4220 -0.0667 -40.9632 -40.0042 -3.1304 -3.1300
1.1603 0.93 950 0.8926 -2.5000 -2.4252 0.4198 -0.0748 -41.1319 -40.2531 -3.1257 -3.1254
0.8716 0.98 1000 0.8921 -2.4988 -2.4246 0.4220 -0.0743 -41.1251 -40.2413 -3.1253 -3.1250

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/v1_1000_STEPS_5e6_rate_01_beta_DPO

Finetuned
(253)
this model