---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
base_model: mistralai/Mistral-7B-Instruct-v0.2
model-index:
- name: zephyr-7b-dpo-lora-pairrm
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-lora-pairrm

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6747
- Rewards/chosen: -1.3181
- Rewards/rejected: -1.4367
- Rewards/accuracies: 0.5727
- Rewards/margins: 0.1186
- Logps/rejected: -357.3805
- Logps/chosen: -340.2056
- Logits/rejected: -4.5482
- Logits/chosen: -4.5594

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6917        | 0.08  | 100  | 0.6924          | -0.0160        | -0.0177          | 0.5287             | 0.0016          | -215.4761      | -210.0007    | -2.5273         | -2.5303       |
| 0.6854        | 0.16  | 200  | 0.6875          | -0.0702        | -0.0835          | 0.5563             | 0.0133          | -222.0610      | -215.4225    | -2.5736         | -2.5764       |
| 0.682         | 0.24  | 300  | 0.6841          | -0.2388        | -0.2651          | 0.5450             | 0.0263          | -240.2197      | -232.2801    | -2.9180         | -2.9209       |
| 0.6634        | 0.32  | 400  | 0.6812          | -0.4832        | -0.5288          | 0.5487             | 0.0455          | -266.5857      | -256.7237    | -3.4549         | -3.4603       |
| 0.6296        | 0.4   | 500  | 0.6782          | -0.6896        | -0.7564          | 0.5600             | 0.0668          | -289.3543      | -277.3629    | -4.1668         | -4.1749       |
| 0.6503        | 0.48  | 600  | 0.6770          | -0.9588        | -1.0440          | 0.5533             | 0.0852          | -318.1134      | -304.2834    | -4.4345         | -4.4433       |
| 0.5974        | 0.56  | 700  | 0.6778          | -1.1455        | -1.2432          | 0.5653             | 0.0977          | -338.0312      | -322.9485    | -4.4370         | -4.4480       |
| 0.6508        | 0.64  | 800  | 0.6748          | -1.1002        | -1.2023          | 0.5650             | 0.1022          | -333.9435      | -318.4168    | -4.2618         | -4.2711       |
| 0.6746        | 0.72  | 900  | 0.6757          | -1.3289        | -1.4445          | 0.5687             | 0.1155          | -358.1558      | -341.2940    | -4.5662         | -4.5772       |
| 0.6151        | 0.8   | 1000 | 0.6755          | -1.3559        | -1.4746          | 0.5690             | 0.1187          | -361.1742      | -343.9893    | -4.6070         | -4.6184       |
| 0.6837        | 0.88  | 1100 | 0.6748          | -1.3246        | -1.4437          | 0.5710             | 0.1192          | -358.0839      | -340.8576    | -4.5607         | -4.5717       |
| 0.6539        | 0.96  | 1200 | 0.6746          | -1.3182        | -1.4369          | 0.5730             | 0.1187          | -357.4036      | -340.2231    | -4.5483         | -4.5595       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.0