End of training

d18768e verified over 1 year ago

5.74 kB

	---
	license: apache-2.0
	base_model: mosaicml/mpt-7b-instruct
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: v1_1000_STEPS_1e6_rate_03_beta_DPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# v1_1000_STEPS_1e6_rate_03_beta_DPO

	This model is a fine-tuned version of [mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6641
	- Rewards/chosen: -1.4066
	- Rewards/rejected: -1.6576
	- Rewards/accuracies: 0.6198
	- Rewards/margins: 0.2510
	- Logps/rejected: -27.0829
	- Logps/chosen: -25.4808
	- Logits/rejected: 13.3887
	- Logits/chosen: 13.3921

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 2
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6901 \| 0.05 \| 50 \| 0.6931 \| 0.0510 \| 0.0490 \| 0.5253 \| 0.0019 \| -21.3940 \| -20.6223 \| 14.3181 \| 14.3207 \|
	\| 0.7257 \| 0.1 \| 100 \| 0.6841 \| 0.0934 \| 0.0501 \| 0.5692 \| 0.0433 \| -21.3906 \| -20.4809 \| 14.1613 \| 14.1641 \|
	\| 0.7259 \| 0.15 \| 150 \| 0.6925 \| -0.0147 \| -0.0834 \| 0.5451 \| 0.0688 \| -21.8355 \| -20.8411 \| 13.9200 \| 13.9229 \|
	\| 0.6593 \| 0.2 \| 200 \| 0.7118 \| 0.4903 \| 0.3962 \| 0.5802 \| 0.0941 \| -20.2368 \| -19.1579 \| 13.7791 \| 13.7821 \|
	\| 0.7282 \| 0.24 \| 250 \| 0.7093 \| -1.2326 \| -1.3686 \| 0.5648 \| 0.1360 \| -26.1195 \| -24.9010 \| 13.8037 \| 13.8067 \|
	\| 0.6924 \| 0.29 \| 300 \| 0.6944 \| -0.7898 \| -0.9655 \| 0.5626 \| 0.1757 \| -24.7758 \| -23.4250 \| 14.0496 \| 14.0528 \|
	\| 0.7523 \| 0.34 \| 350 \| 0.6909 \| -0.9371 \| -1.1226 \| 0.5626 \| 0.1855 \| -25.2994 \| -23.9158 \| 14.0003 \| 14.0037 \|
	\| 0.7276 \| 0.39 \| 400 \| 0.6918 \| -1.8471 \| -2.0415 \| 0.5868 \| 0.1944 \| -28.3625 \| -26.9492 \| 13.3382 \| 13.3414 \|
	\| 0.6255 \| 0.44 \| 450 \| 0.6860 \| -1.5470 \| -1.7599 \| 0.5934 \| 0.2129 \| -27.4236 \| -25.9489 \| 13.2551 \| 13.2584 \|
	\| 0.7342 \| 0.49 \| 500 \| 0.6801 \| -1.5841 \| -1.7888 \| 0.5758 \| 0.2046 \| -27.5199 \| -26.0726 \| 13.4186 \| 13.4219 \|
	\| 0.568 \| 0.54 \| 550 \| 0.6694 \| -1.5101 \| -1.7458 \| 0.6022 \| 0.2356 \| -27.3766 \| -25.8260 \| 13.5776 \| 13.5810 \|
	\| 0.6217 \| 0.59 \| 600 \| 0.6645 \| -1.4050 \| -1.6543 \| 0.6110 \| 0.2492 \| -27.0716 \| -25.4756 \| 13.6337 \| 13.6371 \|
	\| 0.6186 \| 0.64 \| 650 \| 0.6682 \| -1.3826 \| -1.6291 \| 0.5978 \| 0.2465 \| -26.9876 \| -25.4007 \| 13.4204 \| 13.4237 \|
	\| 0.6637 \| 0.68 \| 700 \| 0.6633 \| -1.3994 \| -1.6501 \| 0.6220 \| 0.2507 \| -27.0576 \| -25.4569 \| 13.4574 \| 13.4608 \|
	\| 0.7482 \| 0.73 \| 750 \| 0.6632 \| -1.3772 \| -1.6269 \| 0.6198 \| 0.2497 \| -26.9804 \| -25.3829 \| 13.4047 \| 13.4081 \|
	\| 0.6597 \| 0.78 \| 800 \| 0.6627 \| -1.3970 \| -1.6527 \| 0.6198 \| 0.2557 \| -27.0664 \| -25.4489 \| 13.3914 \| 13.3948 \|
	\| 0.7206 \| 0.83 \| 850 \| 0.6613 \| -1.4018 \| -1.6593 \| 0.6220 \| 0.2575 \| -27.0885 \| -25.4648 \| 13.3862 \| 13.3896 \|
	\| 0.6715 \| 0.88 \| 900 \| 0.6633 \| -1.4047 \| -1.6584 \| 0.6220 \| 0.2537 \| -27.0856 \| -25.4746 \| 13.3969 \| 13.4003 \|
	\| 0.6108 \| 0.93 \| 950 \| 0.6633 \| -1.4042 \| -1.6585 \| 0.6242 \| 0.2543 \| -27.0857 \| -25.4727 \| 13.3883 \| 13.3917 \|
	\| 0.5964 \| 0.98 \| 1000 \| 0.6641 \| -1.4066 \| -1.6576 \| 0.6198 \| 0.2510 \| -27.0829 \| -25.4808 \| 13.3887 \| 13.3921 \|


	### Framework versions

	- Transformers 4.39.1
	- Pytorch 2.0.0+cu117
	- Datasets 2.18.0
	- Tokenizers 0.15.2