shenxq's picture
End of training
04735c1 verified
|
raw
history blame
4.4 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
base_model: mistralai/Mistral-7B-Instruct-v0.2
model-index:
  - name: zephyr-7b-dpo-lora-pairrm
    results: []

zephyr-7b-dpo-lora-pairrm

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6747
  • Rewards/chosen: -1.3181
  • Rewards/rejected: -1.4367
  • Rewards/accuracies: 0.5727
  • Rewards/margins: 0.1186
  • Logps/rejected: -357.3805
  • Logps/chosen: -340.2056
  • Logits/rejected: -4.5482
  • Logits/chosen: -4.5594

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6917 0.08 100 0.6924 -0.0160 -0.0177 0.5287 0.0016 -215.4761 -210.0007 -2.5273 -2.5303
0.6854 0.16 200 0.6875 -0.0702 -0.0835 0.5563 0.0133 -222.0610 -215.4225 -2.5736 -2.5764
0.682 0.24 300 0.6841 -0.2388 -0.2651 0.5450 0.0263 -240.2197 -232.2801 -2.9180 -2.9209
0.6634 0.32 400 0.6812 -0.4832 -0.5288 0.5487 0.0455 -266.5857 -256.7237 -3.4549 -3.4603
0.6296 0.4 500 0.6782 -0.6896 -0.7564 0.5600 0.0668 -289.3543 -277.3629 -4.1668 -4.1749
0.6503 0.48 600 0.6770 -0.9588 -1.0440 0.5533 0.0852 -318.1134 -304.2834 -4.4345 -4.4433
0.5974 0.56 700 0.6778 -1.1455 -1.2432 0.5653 0.0977 -338.0312 -322.9485 -4.4370 -4.4480
0.6508 0.64 800 0.6748 -1.1002 -1.2023 0.5650 0.1022 -333.9435 -318.4168 -4.2618 -4.2711
0.6746 0.72 900 0.6757 -1.3289 -1.4445 0.5687 0.1155 -358.1558 -341.2940 -4.5662 -4.5772
0.6151 0.8 1000 0.6755 -1.3559 -1.4746 0.5690 0.1187 -361.1742 -343.9893 -4.6070 -4.6184
0.6837 0.88 1100 0.6748 -1.3246 -1.4437 0.5710 0.1192 -358.0839 -340.8576 -4.5607 -4.5717
0.6539 0.96 1200 0.6746 -1.3182 -1.4369 0.5730 0.1187 -357.4036 -340.2231 -4.5483 -4.5595

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0