yimingzhang commited on
Commit
02e14d0
·
1 Parent(s): 7239dad

Model save

Browse files
README.md CHANGED
@@ -15,15 +15,15 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 0.6825
19
- - Rewards/chosen: -2.3281
20
- - Rewards/rejected: -6.0108
21
- - Rewards/accuracies: 0.7620
22
- - Rewards/margins: 3.6827
23
- - Logps/rejected: -344.2271
24
- - Logps/chosen: -376.1471
25
- - Logits/rejected: -2.7465
26
- - Logits/chosen: -2.7672
27
 
28
  ## Model description
29
 
@@ -43,14 +43,13 @@ More information needed
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 5e-07
46
- - train_batch_size: 4
47
- - eval_batch_size: 2
48
  - seed: 42
49
  - distributed_type: multi-GPU
50
  - num_devices: 4
51
- - gradient_accumulation_steps: 4
52
- - total_train_batch_size: 64
53
- - total_eval_batch_size: 8
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
56
  - lr_scheduler_warmup_ratio: 0.1
@@ -60,35 +59,64 @@ The following hyperparameters were used during training:
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
- | 0.5513 | 0.1 | 100 | 0.5431 | 0.4641 | -0.1955 | 0.7660 | 0.6596 | -286.0747 | -348.2250 | -2.9049 | -2.9199 |
64
- | 0.5322 | 0.21 | 200 | 0.5251 | 0.4311 | -0.7452 | 0.7680 | 1.1764 | -291.5714 | -348.5544 | -2.9307 | -2.9476 |
65
- | 0.602 | 0.31 | 300 | 0.5439 | 0.5364 | -0.6172 | 0.7440 | 1.1536 | -290.2913 | -347.5018 | -2.9635 | -2.9952 |
66
- | 0.5809 | 0.41 | 400 | 0.5436 | 0.3841 | -0.8922 | 0.7600 | 1.2762 | -293.0408 | -349.0252 | -2.9205 | -2.9360 |
67
- | 0.5164 | 0.52 | 500 | 0.5406 | 0.3006 | -1.1405 | 0.7660 | 1.4410 | -295.5239 | -349.8603 | -3.0117 | -3.0160 |
68
- | 0.5957 | 0.62 | 600 | 0.5336 | 0.1952 | -1.0296 | 0.7380 | 1.2247 | -294.4149 | -350.9140 | -2.9353 | -2.9459 |
69
- | 0.6516 | 0.72 | 700 | 0.5292 | 0.2224 | -1.0543 | 0.7520 | 1.2767 | -294.6618 | -350.6416 | -2.9889 | -2.9857 |
70
- | 0.5353 | 0.83 | 800 | 0.5145 | 0.1103 | -1.4103 | 0.7560 | 1.5206 | -298.2225 | -351.7629 | -2.9116 | -2.9171 |
71
- | 0.5293 | 0.93 | 900 | 0.5146 | 0.1384 | -1.3157 | 0.7620 | 1.4541 | -297.2768 | -351.4819 | -2.9463 | -2.9537 |
72
- | 0.1078 | 1.03 | 1000 | 0.5121 | 0.3042 | -1.6532 | 0.7780 | 1.9574 | -300.6510 | -349.8238 | -2.8942 | -2.9057 |
73
- | 0.0928 | 1.14 | 1100 | 0.5322 | 0.0212 | -2.1481 | 0.7780 | 2.1692 | -305.5999 | -352.6541 | -2.8911 | -2.9062 |
74
- | 0.1295 | 1.24 | 1200 | 0.5384 | -0.2033 | -2.3411 | 0.7760 | 2.1378 | -307.5298 | -354.8984 | -2.9192 | -2.9265 |
75
- | 0.1093 | 1.34 | 1300 | 0.5469 | -0.2341 | -2.5125 | 0.7860 | 2.2784 | -309.2440 | -355.2066 | -2.8709 | -2.8807 |
76
- | 0.1198 | 1.45 | 1400 | 0.5245 | -0.3902 | -2.4518 | 0.7720 | 2.0616 | -308.6370 | -356.7679 | -2.8527 | -2.8626 |
77
- | 0.1122 | 1.55 | 1500 | 0.5524 | -0.6778 | -3.1489 | 0.7860 | 2.4710 | -315.6078 | -359.6443 | -2.8319 | -2.8394 |
78
- | 0.11 | 1.65 | 1600 | 0.5355 | -0.4967 | -2.7524 | 0.7780 | 2.2557 | -311.6435 | -357.8331 | -2.8461 | -2.8559 |
79
- | 0.1092 | 1.76 | 1700 | 0.5581 | -0.6067 | -3.1170 | 0.7800 | 2.5102 | -315.2888 | -358.9333 | -2.8502 | -2.8601 |
80
- | 0.0958 | 1.86 | 1800 | 0.5647 | -0.7279 | -3.2484 | 0.7760 | 2.5205 | -316.6035 | -360.1446 | -2.8474 | -2.8541 |
81
- | 0.122 | 1.96 | 1900 | 0.5520 | -0.7738 | -3.0618 | 0.7780 | 2.2880 | -314.7370 | -360.6041 | -2.8707 | -2.8743 |
82
- | 0.0242 | 2.07 | 2000 | 0.6111 | -1.3091 | -4.2093 | 0.7640 | 2.9002 | -326.2119 | -365.9565 | -2.8475 | -2.8605 |
83
- | 0.017 | 2.17 | 2100 | 0.6473 | -1.7027 | -4.9317 | 0.7620 | 3.2289 | -333.4358 | -369.8930 | -2.8138 | -2.8309 |
84
- | 0.0153 | 2.27 | 2200 | 0.6658 | -2.0049 | -5.3885 | 0.7700 | 3.3837 | -338.0045 | -372.9144 | -2.7872 | -2.8057 |
85
- | 0.0215 | 2.38 | 2300 | 0.6722 | -1.7556 | -5.2164 | 0.7780 | 3.4609 | -336.2837 | -370.4218 | -2.8210 | -2.8406 |
86
- | 0.0128 | 2.48 | 2400 | 0.6772 | -2.0766 | -5.6174 | 0.7700 | 3.5408 | -340.2928 | -373.6317 | -2.7783 | -2.8011 |
87
- | 0.015 | 2.58 | 2500 | 0.6893 | -2.2855 | -5.9178 | 0.7740 | 3.6323 | -343.2976 | -375.7208 | -2.7657 | -2.7877 |
88
- | 0.0118 | 2.69 | 2600 | 0.6937 | -2.5052 | -6.1919 | 0.7660 | 3.6867 | -346.0385 | -377.9179 | -2.7766 | -2.7963 |
89
- | 0.0127 | 2.79 | 2700 | 0.6868 | -2.4883 | -6.1501 | 0.7700 | 3.6618 | -345.6202 | -377.7486 | -2.7437 | -2.7652 |
90
- | 0.0149 | 2.89 | 2800 | 0.6852 | -2.3777 | -6.0421 | 0.7700 | 3.6644 | -344.5401 | -376.6426 | -2.7409 | -2.7623 |
91
- | 0.0105 | 3.0 | 2900 | 0.6832 | -2.3298 | -6.0137 | 0.7640 | 3.6839 | -344.2563 | -376.1639 | -2.7455 | -2.7663 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
 
94
  ### Framework versions
 
15
 
16
  This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 0.6013
19
+ - Rewards/chosen: -1.9595
20
+ - Rewards/rejected: -8.0100
21
+ - Rewards/accuracies: 0.8120
22
+ - Rewards/margins: 6.0505
23
+ - Logps/rejected: -356.2374
24
+ - Logps/chosen: -267.2756
25
+ - Logits/rejected: -2.8085
26
+ - Logits/chosen: -2.7462
27
 
28
  ## Model description
29
 
 
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 5e-07
46
+ - train_batch_size: 8
47
+ - eval_batch_size: 4
48
  - seed: 42
49
  - distributed_type: multi-GPU
50
  - num_devices: 4
51
+ - total_train_batch_size: 32
52
+ - total_eval_batch_size: 16
 
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: linear
55
  - lr_scheduler_warmup_ratio: 0.1
 
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.5613 | 0.05 | 100 | 0.5542 | 0.4616 | 0.0165 | 0.7380 | 0.4451 | -275.9723 | -243.0639 | -2.9495 | -2.9048 |
63
+ | 0.4215 | 0.1 | 200 | 0.4627 | 0.4975 | -0.6989 | 0.7840 | 1.1965 | -283.1268 | -242.7047 | -2.9388 | -2.8915 |
64
+ | 0.4508 | 0.15 | 300 | 0.4707 | 0.4510 | -1.1860 | 0.7840 | 1.6370 | -287.9977 | -243.1706 | -2.9512 | -2.9006 |
65
+ | 0.5348 | 0.21 | 400 | 0.4709 | 0.3351 | -1.7399 | 0.8040 | 2.0750 | -293.5365 | -244.3292 | -3.0053 | -2.9561 |
66
+ | 0.4742 | 0.26 | 500 | 0.5065 | 0.3952 | -1.7944 | 0.8220 | 2.1896 | -294.0814 | -243.7279 | -3.1011 | -3.0500 |
67
+ | 0.6062 | 0.31 | 600 | 0.4503 | 0.4052 | -1.9035 | 0.7980 | 2.3087 | -295.1721 | -243.6278 | -3.0394 | -2.9736 |
68
+ | 0.4228 | 0.36 | 700 | 0.5026 | -0.0483 | -2.6359 | 0.8200 | 2.5877 | -302.4969 | -248.1629 | -3.0659 | -2.9973 |
69
+ | 0.5396 | 0.41 | 800 | 0.4615 | 0.5120 | -2.0322 | 0.8060 | 2.5442 | -296.4592 | -242.5603 | -2.9105 | -2.8560 |
70
+ | 0.5377 | 0.46 | 900 | 0.4913 | 0.5025 | -1.9568 | 0.7960 | 2.4593 | -295.7052 | -242.6552 | -2.9651 | -2.9045 |
71
+ | 0.4886 | 0.52 | 1000 | 0.4495 | 0.0867 | -2.7909 | 0.8060 | 2.8776 | -304.0464 | -246.8128 | -2.9735 | -2.8935 |
72
+ | 0.4447 | 0.57 | 1100 | 0.4398 | 0.3296 | -2.4020 | 0.8100 | 2.7316 | -300.1573 | -244.3844 | -2.8707 | -2.7943 |
73
+ | 0.4971 | 0.62 | 1200 | 0.4412 | 0.5074 | -2.2162 | 0.7940 | 2.7236 | -298.2993 | -242.6058 | -2.8602 | -2.7825 |
74
+ | 0.5218 | 0.67 | 1300 | 0.4986 | 0.4726 | -2.3083 | 0.7960 | 2.7809 | -299.2201 | -242.9541 | -2.9537 | -2.8866 |
75
+ | 0.6129 | 0.72 | 1400 | 0.4818 | 0.5578 | -2.2246 | 0.8080 | 2.7824 | -298.3839 | -242.1022 | -3.0072 | -2.9438 |
76
+ | 0.3862 | 0.77 | 1500 | 0.4689 | 0.3254 | -2.6525 | 0.8140 | 2.9779 | -302.6622 | -244.4263 | -2.8976 | -2.8354 |
77
+ | 0.4186 | 0.83 | 1600 | 0.4497 | 0.3061 | -2.9514 | 0.8040 | 3.2575 | -305.6511 | -244.6188 | -2.9207 | -2.8589 |
78
+ | 0.4765 | 0.88 | 1700 | 0.4296 | 0.3788 | -2.6225 | 0.8060 | 3.0012 | -302.3619 | -243.8926 | -2.9836 | -2.9241 |
79
+ | 0.4783 | 0.93 | 1800 | 0.4422 | 0.0944 | -2.9868 | 0.8040 | 3.0812 | -306.0055 | -246.7358 | -2.9534 | -2.8865 |
80
+ | 0.465 | 0.98 | 1900 | 0.4434 | 0.5028 | -2.3326 | 0.7960 | 2.8354 | -299.4631 | -242.6521 | -2.9355 | -2.8713 |
81
+ | 0.0921 | 1.03 | 2000 | 0.4447 | 0.1567 | -3.4476 | 0.8120 | 3.6043 | -310.6131 | -246.1128 | -2.8519 | -2.7858 |
82
+ | 0.0776 | 1.08 | 2100 | 0.4776 | 0.0909 | -3.9422 | 0.8140 | 4.0330 | -315.5593 | -246.7717 | -2.8412 | -2.7763 |
83
+ | 0.0679 | 1.14 | 2200 | 0.4770 | -0.6731 | -4.8208 | 0.8240 | 4.1477 | -324.3449 | -254.4110 | -2.8085 | -2.7446 |
84
+ | 0.0696 | 1.19 | 2300 | 0.4886 | -0.0248 | -4.1796 | 0.8160 | 4.1548 | -317.9334 | -247.9280 | -2.8622 | -2.8014 |
85
+ | 0.1026 | 1.24 | 2400 | 0.4862 | 0.1088 | -3.8957 | 0.8160 | 4.0044 | -315.0940 | -246.5922 | -2.8702 | -2.8103 |
86
+ | 0.104 | 1.29 | 2500 | 0.5141 | -0.6043 | -5.0727 | 0.8080 | 4.4684 | -326.8640 | -253.7228 | -2.8105 | -2.7535 |
87
+ | 0.0728 | 1.34 | 2600 | 0.5166 | -0.5809 | -4.9937 | 0.8080 | 4.4128 | -326.0744 | -253.4896 | -2.8659 | -2.8016 |
88
+ | 0.0844 | 1.39 | 2700 | 0.4835 | -0.6211 | -4.6437 | 0.8160 | 4.0226 | -322.5744 | -253.8915 | -2.8901 | -2.8305 |
89
+ | 0.0733 | 1.45 | 2800 | 0.4738 | -0.1863 | -4.1760 | 0.8120 | 3.9897 | -317.8976 | -249.5429 | -2.9311 | -2.8814 |
90
+ | 0.1837 | 1.5 | 2900 | 0.4764 | -0.0201 | -4.2761 | 0.8060 | 4.2560 | -318.8984 | -247.8809 | -2.9295 | -2.8720 |
91
+ | 0.2113 | 1.55 | 3000 | 0.4709 | -0.0570 | -3.9772 | 0.8080 | 3.9202 | -315.9093 | -248.2498 | -2.8978 | -2.8435 |
92
+ | 0.1858 | 1.6 | 3100 | 0.4769 | -0.1959 | -4.2238 | 0.7960 | 4.0278 | -318.3751 | -249.6395 | -2.9043 | -2.8498 |
93
+ | 0.095 | 1.65 | 3200 | 0.4939 | -0.3083 | -4.3033 | 0.8120 | 3.9950 | -319.1705 | -250.7627 | -2.9288 | -2.8688 |
94
+ | 0.1147 | 1.7 | 3300 | 0.4897 | -0.4599 | -4.7081 | 0.8080 | 4.2482 | -323.2183 | -252.2793 | -2.9112 | -2.8484 |
95
+ | 0.1677 | 1.76 | 3400 | 0.4930 | -0.7465 | -5.1191 | 0.8200 | 4.3726 | -327.3288 | -255.1453 | -2.8408 | -2.7809 |
96
+ | 0.0581 | 1.81 | 3500 | 0.4859 | -0.2916 | -4.5176 | 0.8180 | 4.2259 | -321.3130 | -250.5966 | -2.8749 | -2.8191 |
97
+ | 0.053 | 1.86 | 3600 | 0.4978 | -0.6092 | -5.0514 | 0.8220 | 4.4422 | -326.6519 | -253.7722 | -2.8885 | -2.8300 |
98
+ | 0.0603 | 1.91 | 3700 | 0.4830 | -0.7539 | -5.0723 | 0.8060 | 4.3184 | -326.8602 | -255.2187 | -2.8710 | -2.8075 |
99
+ | 0.1269 | 1.96 | 3800 | 0.4793 | -0.4331 | -4.5194 | 0.8160 | 4.0863 | -321.3315 | -252.0114 | -2.9121 | -2.8554 |
100
+ | 0.0191 | 2.01 | 3900 | 0.4803 | -0.4886 | -4.9886 | 0.8160 | 4.5000 | -326.0231 | -252.5659 | -2.8857 | -2.8246 |
101
+ | 0.0168 | 2.07 | 4000 | 0.5259 | -1.0235 | -6.1251 | 0.8060 | 5.1016 | -337.3882 | -257.9146 | -2.8419 | -2.7775 |
102
+ | 0.0114 | 2.12 | 4100 | 0.5714 | -1.5737 | -7.0255 | 0.8140 | 5.4519 | -346.3929 | -263.4171 | -2.8249 | -2.7582 |
103
+ | 0.0114 | 2.17 | 4200 | 0.5547 | -1.8288 | -7.2840 | 0.8020 | 5.4552 | -348.9774 | -265.9677 | -2.8102 | -2.7409 |
104
+ | 0.0482 | 2.22 | 4300 | 0.5437 | -1.1582 | -6.4741 | 0.8140 | 5.3159 | -340.8786 | -259.2626 | -2.8513 | -2.7874 |
105
+ | 0.0172 | 2.27 | 4400 | 0.5489 | -1.5961 | -7.1623 | 0.8100 | 5.5662 | -347.7602 | -263.6409 | -2.8474 | -2.7836 |
106
+ | 0.1044 | 2.32 | 4500 | 0.5818 | -1.8548 | -7.7495 | 0.8140 | 5.8947 | -353.6325 | -266.2277 | -2.8482 | -2.7839 |
107
+ | 0.012 | 2.37 | 4600 | 0.5813 | -1.6912 | -7.5587 | 0.8160 | 5.8675 | -351.7242 | -264.5919 | -2.8512 | -2.7866 |
108
+ | 0.0122 | 2.43 | 4700 | 0.6052 | -2.2384 | -8.3688 | 0.8060 | 6.1304 | -359.8252 | -270.0639 | -2.8210 | -2.7558 |
109
+ | 0.0636 | 2.48 | 4800 | 0.5867 | -1.8483 | -7.7813 | 0.8140 | 5.9330 | -353.9502 | -266.1630 | -2.8455 | -2.7797 |
110
+ | 0.0125 | 2.53 | 4900 | 0.5878 | -1.9082 | -7.7997 | 0.8140 | 5.8915 | -354.1346 | -266.7619 | -2.8342 | -2.7687 |
111
+ | 0.0105 | 2.58 | 5000 | 0.5969 | -2.1624 | -8.2116 | 0.8120 | 6.0492 | -358.2536 | -269.3045 | -2.8144 | -2.7498 |
112
+ | 0.0207 | 2.63 | 5100 | 0.6008 | -2.1674 | -8.2218 | 0.8120 | 6.0544 | -358.3557 | -269.3546 | -2.8197 | -2.7557 |
113
+ | 0.0103 | 2.68 | 5200 | 0.6214 | -2.3910 | -8.6148 | 0.8060 | 6.2238 | -362.2856 | -271.5901 | -2.8181 | -2.7546 |
114
+ | 0.0035 | 2.74 | 5300 | 0.6090 | -2.3006 | -8.4330 | 0.8120 | 6.1324 | -360.4677 | -270.6860 | -2.8048 | -2.7436 |
115
+ | 0.0145 | 2.79 | 5400 | 0.6056 | -2.1076 | -8.1956 | 0.8120 | 6.0880 | -358.0930 | -268.7557 | -2.8059 | -2.7451 |
116
+ | 0.0115 | 2.84 | 5500 | 0.5965 | -2.0098 | -7.9907 | 0.8160 | 5.9809 | -356.0446 | -267.7783 | -2.8139 | -2.7522 |
117
+ | 0.0321 | 2.89 | 5600 | 0.6051 | -2.0432 | -8.1034 | 0.8080 | 6.0602 | -357.1714 | -268.1118 | -2.8136 | -2.7510 |
118
+ | 0.0087 | 2.94 | 5700 | 0.6041 | -2.0226 | -8.0892 | 0.8140 | 6.0666 | -357.0298 | -267.9061 | -2.8100 | -2.7475 |
119
+ | 0.0057 | 2.99 | 5800 | 0.6031 | -1.9575 | -8.0080 | 0.8140 | 6.0505 | -356.2176 | -267.2556 | -2.8082 | -2.7457 |
120
 
121
 
122
  ### Framework versions
all_results.json CHANGED
@@ -1,21 +1,21 @@
1
  {
2
  "epoch": 3.0,
3
- "eval_logits/chosen": -2.767199754714966,
4
- "eval_logits/rejected": -2.746457099914551,
5
- "eval_logps/chosen": -376.1471252441406,
6
- "eval_logps/rejected": -344.22711181640625,
7
- "eval_loss": 0.6825046539306641,
8
- "eval_rewards/accuracies": 0.7620000243186951,
9
- "eval_rewards/chosen": -2.328126907348633,
10
- "eval_rewards/margins": 3.682659149169922,
11
- "eval_rewards/rejected": -6.0107855796813965,
12
- "eval_runtime": 500.0939,
13
  "eval_samples": 2000,
14
- "eval_samples_per_second": 3.999,
15
- "eval_steps_per_second": 0.5,
16
- "train_loss": 0.2301063642942298,
17
- "train_runtime": 127972.4677,
18
  "train_samples": 61966,
19
- "train_samples_per_second": 1.453,
20
- "train_steps_per_second": 0.023
21
  }
 
1
  {
2
  "epoch": 3.0,
3
+ "eval_logits/chosen": -2.746201753616333,
4
+ "eval_logits/rejected": -2.8084917068481445,
5
+ "eval_logps/chosen": -267.275634765625,
6
+ "eval_logps/rejected": -356.2374267578125,
7
+ "eval_loss": 0.6013044714927673,
8
+ "eval_rewards/accuracies": 0.8119999766349792,
9
+ "eval_rewards/chosen": -1.9595460891723633,
10
+ "eval_rewards/margins": 6.0504584312438965,
11
+ "eval_rewards/rejected": -8.010004043579102,
12
+ "eval_runtime": 278.5463,
13
  "eval_samples": 2000,
14
+ "eval_samples_per_second": 7.18,
15
+ "eval_steps_per_second": 0.449,
16
+ "train_loss": 0.19806672207460788,
17
+ "train_runtime": 74526.9689,
18
  "train_samples": 61966,
19
+ "train_samples_per_second": 2.494,
20
+ "train_steps_per_second": 0.078
21
  }
eval_results.json CHANGED
@@ -1,16 +1,16 @@
1
  {
2
  "epoch": 3.0,
3
- "eval_logits/chosen": -2.767199754714966,
4
- "eval_logits/rejected": -2.746457099914551,
5
- "eval_logps/chosen": -376.1471252441406,
6
- "eval_logps/rejected": -344.22711181640625,
7
- "eval_loss": 0.6825046539306641,
8
- "eval_rewards/accuracies": 0.7620000243186951,
9
- "eval_rewards/chosen": -2.328126907348633,
10
- "eval_rewards/margins": 3.682659149169922,
11
- "eval_rewards/rejected": -6.0107855796813965,
12
- "eval_runtime": 500.0939,
13
  "eval_samples": 2000,
14
- "eval_samples_per_second": 3.999,
15
- "eval_steps_per_second": 0.5
16
  }
 
1
  {
2
  "epoch": 3.0,
3
+ "eval_logits/chosen": -2.746201753616333,
4
+ "eval_logits/rejected": -2.8084917068481445,
5
+ "eval_logps/chosen": -267.275634765625,
6
+ "eval_logps/rejected": -356.2374267578125,
7
+ "eval_loss": 0.6013044714927673,
8
+ "eval_rewards/accuracies": 0.8119999766349792,
9
+ "eval_rewards/chosen": -1.9595460891723633,
10
+ "eval_rewards/margins": 6.0504584312438965,
11
+ "eval_rewards/rejected": -8.010004043579102,
12
+ "eval_runtime": 278.5463,
13
  "eval_samples": 2000,
14
+ "eval_samples_per_second": 7.18,
15
+ "eval_steps_per_second": 0.449
16
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:808ef2696c7407dc012456bc23cff3316e83d2b2aa06406e4e5beccc78c0af4a
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30799b1c1ca3ea668f06802ba4955898fa0b5db1587e5631c64a1a254103153d
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bf2c7818d56f60049599171a58d8c9af0bec037d361659d10e8430918d3924ee
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77e03a3d268216d271df0c9f332883fd3d0d0c00b86ae884cc9869ebbfaef0d1
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e04cd2313acd3728d13597167a19d70a28d841d2c6c0ac006c52d1c257ba550f
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55bcaaaffe978c185af8fbb3177120a0e1b25d723c972a6f82d3b9c330f17e1c
3
  size 4540516344
runs/Dec27_17-59-29_babel-5-3/events.out.tfevents.1703718038.babel-5-3.969100.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a803b10ddbdd470fd7c864c27ea27472cfac7e4452455c59ca66d8fd8316a46f
3
+ size 416385
runs/Dec27_17-59-29_babel-5-3/events.out.tfevents.1703792843.babel-5-3.969100.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63a746fb124780ba5a8ab8f708f822d1b02bdbd4536e13ef2915e7add757e694
3
+ size 828
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 3.0,
3
- "train_loss": 0.2301063642942298,
4
- "train_runtime": 127972.4677,
5
  "train_samples": 61966,
6
- "train_samples_per_second": 1.453,
7
- "train_steps_per_second": 0.023
8
  }
 
1
  {
2
  "epoch": 3.0,
3
+ "train_loss": 0.19806672207460788,
4
+ "train_runtime": 74526.9689,
5
  "train_samples": 61966,
6
+ "train_samples_per_second": 2.494,
7
+ "train_steps_per_second": 0.078
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c69ce8eb3319a6a0f5353fe9736ff2237169300eb2f111f39b12712ae8d33af6
3
- size 5752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49bac97a38d0e5fbdaa25a18765e70e6e313e87821bacfe0b97cc49ca8296f79
3
+ size 5688