Commit
·
02e14d0
1
Parent(s):
7239dad
Model save
Browse files- README.md +71 -43
- all_results.json +16 -16
- eval_results.json +12 -12
- model-00001-of-00003.safetensors +1 -1
- model-00002-of-00003.safetensors +1 -1
- model-00003-of-00003.safetensors +1 -1
- runs/Dec27_17-59-29_babel-5-3/events.out.tfevents.1703718038.babel-5-3.969100.0 +3 -0
- runs/Dec27_17-59-29_babel-5-3/events.out.tfevents.1703792843.babel-5-3.969100.1 +3 -0
- train_results.json +4 -4
- trainer_state.json +0 -0
- training_args.bin +2 -2
README.md
CHANGED
@@ -15,15 +15,15 @@ should probably proofread and complete it, then remove this comment. -->
|
|
15 |
|
16 |
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
|
17 |
It achieves the following results on the evaluation set:
|
18 |
-
- Loss: 0.
|
19 |
-
- Rewards/chosen: -
|
20 |
-
- Rewards/rejected: -
|
21 |
-
- Rewards/accuracies: 0.
|
22 |
-
- Rewards/margins:
|
23 |
-
- Logps/rejected: -
|
24 |
-
- Logps/chosen: -
|
25 |
-
- Logits/rejected: -2.
|
26 |
-
- Logits/chosen: -2.
|
27 |
|
28 |
## Model description
|
29 |
|
@@ -43,14 +43,13 @@ More information needed
|
|
43 |
|
44 |
The following hyperparameters were used during training:
|
45 |
- learning_rate: 5e-07
|
46 |
-
- train_batch_size:
|
47 |
-
- eval_batch_size:
|
48 |
- seed: 42
|
49 |
- distributed_type: multi-GPU
|
50 |
- num_devices: 4
|
51 |
-
-
|
52 |
-
-
|
53 |
-
- total_eval_batch_size: 8
|
54 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
55 |
- lr_scheduler_type: linear
|
56 |
- lr_scheduler_warmup_ratio: 0.1
|
@@ -60,35 +59,64 @@ The following hyperparameters were used during training:
|
|
60 |
|
61 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
62 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
63 |
-
| 0.
|
64 |
-
| 0.
|
65 |
-
| 0.
|
66 |
-
| 0.
|
67 |
-
| 0.
|
68 |
-
| 0.
|
69 |
-
| 0.
|
70 |
-
| 0.
|
71 |
-
| 0.
|
72 |
-
| 0.
|
73 |
-
| 0.
|
74 |
-
| 0.
|
75 |
-
| 0.
|
76 |
-
| 0.
|
77 |
-
| 0.
|
78 |
-
| 0.
|
79 |
-
| 0.
|
80 |
-
| 0.
|
81 |
-
| 0.
|
82 |
-
| 0.
|
83 |
-
| 0.
|
84 |
-
| 0.
|
85 |
-
| 0.
|
86 |
-
| 0.
|
87 |
-
| 0.
|
88 |
-
| 0.
|
89 |
-
| 0.
|
90 |
-
| 0.
|
91 |
-
| 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
|
93 |
|
94 |
### Framework versions
|
|
|
15 |
|
16 |
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- Loss: 0.6013
|
19 |
+
- Rewards/chosen: -1.9595
|
20 |
+
- Rewards/rejected: -8.0100
|
21 |
+
- Rewards/accuracies: 0.8120
|
22 |
+
- Rewards/margins: 6.0505
|
23 |
+
- Logps/rejected: -356.2374
|
24 |
+
- Logps/chosen: -267.2756
|
25 |
+
- Logits/rejected: -2.8085
|
26 |
+
- Logits/chosen: -2.7462
|
27 |
|
28 |
## Model description
|
29 |
|
|
|
43 |
|
44 |
The following hyperparameters were used during training:
|
45 |
- learning_rate: 5e-07
|
46 |
+
- train_batch_size: 8
|
47 |
+
- eval_batch_size: 4
|
48 |
- seed: 42
|
49 |
- distributed_type: multi-GPU
|
50 |
- num_devices: 4
|
51 |
+
- total_train_batch_size: 32
|
52 |
+
- total_eval_batch_size: 16
|
|
|
53 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
54 |
- lr_scheduler_type: linear
|
55 |
- lr_scheduler_warmup_ratio: 0.1
|
|
|
59 |
|
60 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
61 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
62 |
+
| 0.5613 | 0.05 | 100 | 0.5542 | 0.4616 | 0.0165 | 0.7380 | 0.4451 | -275.9723 | -243.0639 | -2.9495 | -2.9048 |
|
63 |
+
| 0.4215 | 0.1 | 200 | 0.4627 | 0.4975 | -0.6989 | 0.7840 | 1.1965 | -283.1268 | -242.7047 | -2.9388 | -2.8915 |
|
64 |
+
| 0.4508 | 0.15 | 300 | 0.4707 | 0.4510 | -1.1860 | 0.7840 | 1.6370 | -287.9977 | -243.1706 | -2.9512 | -2.9006 |
|
65 |
+
| 0.5348 | 0.21 | 400 | 0.4709 | 0.3351 | -1.7399 | 0.8040 | 2.0750 | -293.5365 | -244.3292 | -3.0053 | -2.9561 |
|
66 |
+
| 0.4742 | 0.26 | 500 | 0.5065 | 0.3952 | -1.7944 | 0.8220 | 2.1896 | -294.0814 | -243.7279 | -3.1011 | -3.0500 |
|
67 |
+
| 0.6062 | 0.31 | 600 | 0.4503 | 0.4052 | -1.9035 | 0.7980 | 2.3087 | -295.1721 | -243.6278 | -3.0394 | -2.9736 |
|
68 |
+
| 0.4228 | 0.36 | 700 | 0.5026 | -0.0483 | -2.6359 | 0.8200 | 2.5877 | -302.4969 | -248.1629 | -3.0659 | -2.9973 |
|
69 |
+
| 0.5396 | 0.41 | 800 | 0.4615 | 0.5120 | -2.0322 | 0.8060 | 2.5442 | -296.4592 | -242.5603 | -2.9105 | -2.8560 |
|
70 |
+
| 0.5377 | 0.46 | 900 | 0.4913 | 0.5025 | -1.9568 | 0.7960 | 2.4593 | -295.7052 | -242.6552 | -2.9651 | -2.9045 |
|
71 |
+
| 0.4886 | 0.52 | 1000 | 0.4495 | 0.0867 | -2.7909 | 0.8060 | 2.8776 | -304.0464 | -246.8128 | -2.9735 | -2.8935 |
|
72 |
+
| 0.4447 | 0.57 | 1100 | 0.4398 | 0.3296 | -2.4020 | 0.8100 | 2.7316 | -300.1573 | -244.3844 | -2.8707 | -2.7943 |
|
73 |
+
| 0.4971 | 0.62 | 1200 | 0.4412 | 0.5074 | -2.2162 | 0.7940 | 2.7236 | -298.2993 | -242.6058 | -2.8602 | -2.7825 |
|
74 |
+
| 0.5218 | 0.67 | 1300 | 0.4986 | 0.4726 | -2.3083 | 0.7960 | 2.7809 | -299.2201 | -242.9541 | -2.9537 | -2.8866 |
|
75 |
+
| 0.6129 | 0.72 | 1400 | 0.4818 | 0.5578 | -2.2246 | 0.8080 | 2.7824 | -298.3839 | -242.1022 | -3.0072 | -2.9438 |
|
76 |
+
| 0.3862 | 0.77 | 1500 | 0.4689 | 0.3254 | -2.6525 | 0.8140 | 2.9779 | -302.6622 | -244.4263 | -2.8976 | -2.8354 |
|
77 |
+
| 0.4186 | 0.83 | 1600 | 0.4497 | 0.3061 | -2.9514 | 0.8040 | 3.2575 | -305.6511 | -244.6188 | -2.9207 | -2.8589 |
|
78 |
+
| 0.4765 | 0.88 | 1700 | 0.4296 | 0.3788 | -2.6225 | 0.8060 | 3.0012 | -302.3619 | -243.8926 | -2.9836 | -2.9241 |
|
79 |
+
| 0.4783 | 0.93 | 1800 | 0.4422 | 0.0944 | -2.9868 | 0.8040 | 3.0812 | -306.0055 | -246.7358 | -2.9534 | -2.8865 |
|
80 |
+
| 0.465 | 0.98 | 1900 | 0.4434 | 0.5028 | -2.3326 | 0.7960 | 2.8354 | -299.4631 | -242.6521 | -2.9355 | -2.8713 |
|
81 |
+
| 0.0921 | 1.03 | 2000 | 0.4447 | 0.1567 | -3.4476 | 0.8120 | 3.6043 | -310.6131 | -246.1128 | -2.8519 | -2.7858 |
|
82 |
+
| 0.0776 | 1.08 | 2100 | 0.4776 | 0.0909 | -3.9422 | 0.8140 | 4.0330 | -315.5593 | -246.7717 | -2.8412 | -2.7763 |
|
83 |
+
| 0.0679 | 1.14 | 2200 | 0.4770 | -0.6731 | -4.8208 | 0.8240 | 4.1477 | -324.3449 | -254.4110 | -2.8085 | -2.7446 |
|
84 |
+
| 0.0696 | 1.19 | 2300 | 0.4886 | -0.0248 | -4.1796 | 0.8160 | 4.1548 | -317.9334 | -247.9280 | -2.8622 | -2.8014 |
|
85 |
+
| 0.1026 | 1.24 | 2400 | 0.4862 | 0.1088 | -3.8957 | 0.8160 | 4.0044 | -315.0940 | -246.5922 | -2.8702 | -2.8103 |
|
86 |
+
| 0.104 | 1.29 | 2500 | 0.5141 | -0.6043 | -5.0727 | 0.8080 | 4.4684 | -326.8640 | -253.7228 | -2.8105 | -2.7535 |
|
87 |
+
| 0.0728 | 1.34 | 2600 | 0.5166 | -0.5809 | -4.9937 | 0.8080 | 4.4128 | -326.0744 | -253.4896 | -2.8659 | -2.8016 |
|
88 |
+
| 0.0844 | 1.39 | 2700 | 0.4835 | -0.6211 | -4.6437 | 0.8160 | 4.0226 | -322.5744 | -253.8915 | -2.8901 | -2.8305 |
|
89 |
+
| 0.0733 | 1.45 | 2800 | 0.4738 | -0.1863 | -4.1760 | 0.8120 | 3.9897 | -317.8976 | -249.5429 | -2.9311 | -2.8814 |
|
90 |
+
| 0.1837 | 1.5 | 2900 | 0.4764 | -0.0201 | -4.2761 | 0.8060 | 4.2560 | -318.8984 | -247.8809 | -2.9295 | -2.8720 |
|
91 |
+
| 0.2113 | 1.55 | 3000 | 0.4709 | -0.0570 | -3.9772 | 0.8080 | 3.9202 | -315.9093 | -248.2498 | -2.8978 | -2.8435 |
|
92 |
+
| 0.1858 | 1.6 | 3100 | 0.4769 | -0.1959 | -4.2238 | 0.7960 | 4.0278 | -318.3751 | -249.6395 | -2.9043 | -2.8498 |
|
93 |
+
| 0.095 | 1.65 | 3200 | 0.4939 | -0.3083 | -4.3033 | 0.8120 | 3.9950 | -319.1705 | -250.7627 | -2.9288 | -2.8688 |
|
94 |
+
| 0.1147 | 1.7 | 3300 | 0.4897 | -0.4599 | -4.7081 | 0.8080 | 4.2482 | -323.2183 | -252.2793 | -2.9112 | -2.8484 |
|
95 |
+
| 0.1677 | 1.76 | 3400 | 0.4930 | -0.7465 | -5.1191 | 0.8200 | 4.3726 | -327.3288 | -255.1453 | -2.8408 | -2.7809 |
|
96 |
+
| 0.0581 | 1.81 | 3500 | 0.4859 | -0.2916 | -4.5176 | 0.8180 | 4.2259 | -321.3130 | -250.5966 | -2.8749 | -2.8191 |
|
97 |
+
| 0.053 | 1.86 | 3600 | 0.4978 | -0.6092 | -5.0514 | 0.8220 | 4.4422 | -326.6519 | -253.7722 | -2.8885 | -2.8300 |
|
98 |
+
| 0.0603 | 1.91 | 3700 | 0.4830 | -0.7539 | -5.0723 | 0.8060 | 4.3184 | -326.8602 | -255.2187 | -2.8710 | -2.8075 |
|
99 |
+
| 0.1269 | 1.96 | 3800 | 0.4793 | -0.4331 | -4.5194 | 0.8160 | 4.0863 | -321.3315 | -252.0114 | -2.9121 | -2.8554 |
|
100 |
+
| 0.0191 | 2.01 | 3900 | 0.4803 | -0.4886 | -4.9886 | 0.8160 | 4.5000 | -326.0231 | -252.5659 | -2.8857 | -2.8246 |
|
101 |
+
| 0.0168 | 2.07 | 4000 | 0.5259 | -1.0235 | -6.1251 | 0.8060 | 5.1016 | -337.3882 | -257.9146 | -2.8419 | -2.7775 |
|
102 |
+
| 0.0114 | 2.12 | 4100 | 0.5714 | -1.5737 | -7.0255 | 0.8140 | 5.4519 | -346.3929 | -263.4171 | -2.8249 | -2.7582 |
|
103 |
+
| 0.0114 | 2.17 | 4200 | 0.5547 | -1.8288 | -7.2840 | 0.8020 | 5.4552 | -348.9774 | -265.9677 | -2.8102 | -2.7409 |
|
104 |
+
| 0.0482 | 2.22 | 4300 | 0.5437 | -1.1582 | -6.4741 | 0.8140 | 5.3159 | -340.8786 | -259.2626 | -2.8513 | -2.7874 |
|
105 |
+
| 0.0172 | 2.27 | 4400 | 0.5489 | -1.5961 | -7.1623 | 0.8100 | 5.5662 | -347.7602 | -263.6409 | -2.8474 | -2.7836 |
|
106 |
+
| 0.1044 | 2.32 | 4500 | 0.5818 | -1.8548 | -7.7495 | 0.8140 | 5.8947 | -353.6325 | -266.2277 | -2.8482 | -2.7839 |
|
107 |
+
| 0.012 | 2.37 | 4600 | 0.5813 | -1.6912 | -7.5587 | 0.8160 | 5.8675 | -351.7242 | -264.5919 | -2.8512 | -2.7866 |
|
108 |
+
| 0.0122 | 2.43 | 4700 | 0.6052 | -2.2384 | -8.3688 | 0.8060 | 6.1304 | -359.8252 | -270.0639 | -2.8210 | -2.7558 |
|
109 |
+
| 0.0636 | 2.48 | 4800 | 0.5867 | -1.8483 | -7.7813 | 0.8140 | 5.9330 | -353.9502 | -266.1630 | -2.8455 | -2.7797 |
|
110 |
+
| 0.0125 | 2.53 | 4900 | 0.5878 | -1.9082 | -7.7997 | 0.8140 | 5.8915 | -354.1346 | -266.7619 | -2.8342 | -2.7687 |
|
111 |
+
| 0.0105 | 2.58 | 5000 | 0.5969 | -2.1624 | -8.2116 | 0.8120 | 6.0492 | -358.2536 | -269.3045 | -2.8144 | -2.7498 |
|
112 |
+
| 0.0207 | 2.63 | 5100 | 0.6008 | -2.1674 | -8.2218 | 0.8120 | 6.0544 | -358.3557 | -269.3546 | -2.8197 | -2.7557 |
|
113 |
+
| 0.0103 | 2.68 | 5200 | 0.6214 | -2.3910 | -8.6148 | 0.8060 | 6.2238 | -362.2856 | -271.5901 | -2.8181 | -2.7546 |
|
114 |
+
| 0.0035 | 2.74 | 5300 | 0.6090 | -2.3006 | -8.4330 | 0.8120 | 6.1324 | -360.4677 | -270.6860 | -2.8048 | -2.7436 |
|
115 |
+
| 0.0145 | 2.79 | 5400 | 0.6056 | -2.1076 | -8.1956 | 0.8120 | 6.0880 | -358.0930 | -268.7557 | -2.8059 | -2.7451 |
|
116 |
+
| 0.0115 | 2.84 | 5500 | 0.5965 | -2.0098 | -7.9907 | 0.8160 | 5.9809 | -356.0446 | -267.7783 | -2.8139 | -2.7522 |
|
117 |
+
| 0.0321 | 2.89 | 5600 | 0.6051 | -2.0432 | -8.1034 | 0.8080 | 6.0602 | -357.1714 | -268.1118 | -2.8136 | -2.7510 |
|
118 |
+
| 0.0087 | 2.94 | 5700 | 0.6041 | -2.0226 | -8.0892 | 0.8140 | 6.0666 | -357.0298 | -267.9061 | -2.8100 | -2.7475 |
|
119 |
+
| 0.0057 | 2.99 | 5800 | 0.6031 | -1.9575 | -8.0080 | 0.8140 | 6.0505 | -356.2176 | -267.2556 | -2.8082 | -2.7457 |
|
120 |
|
121 |
|
122 |
### Framework versions
|
all_results.json
CHANGED
@@ -1,21 +1,21 @@
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
-
"eval_logits/chosen": -2.
|
4 |
-
"eval_logits/rejected": -2.
|
5 |
-
"eval_logps/chosen": -
|
6 |
-
"eval_logps/rejected": -
|
7 |
-
"eval_loss": 0.
|
8 |
-
"eval_rewards/accuracies": 0.
|
9 |
-
"eval_rewards/chosen": -
|
10 |
-
"eval_rewards/margins":
|
11 |
-
"eval_rewards/rejected": -
|
12 |
-
"eval_runtime":
|
13 |
"eval_samples": 2000,
|
14 |
-
"eval_samples_per_second":
|
15 |
-
"eval_steps_per_second": 0.
|
16 |
-
"train_loss": 0.
|
17 |
-
"train_runtime":
|
18 |
"train_samples": 61966,
|
19 |
-
"train_samples_per_second":
|
20 |
-
"train_steps_per_second": 0.
|
21 |
}
|
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
+
"eval_logits/chosen": -2.746201753616333,
|
4 |
+
"eval_logits/rejected": -2.8084917068481445,
|
5 |
+
"eval_logps/chosen": -267.275634765625,
|
6 |
+
"eval_logps/rejected": -356.2374267578125,
|
7 |
+
"eval_loss": 0.6013044714927673,
|
8 |
+
"eval_rewards/accuracies": 0.8119999766349792,
|
9 |
+
"eval_rewards/chosen": -1.9595460891723633,
|
10 |
+
"eval_rewards/margins": 6.0504584312438965,
|
11 |
+
"eval_rewards/rejected": -8.010004043579102,
|
12 |
+
"eval_runtime": 278.5463,
|
13 |
"eval_samples": 2000,
|
14 |
+
"eval_samples_per_second": 7.18,
|
15 |
+
"eval_steps_per_second": 0.449,
|
16 |
+
"train_loss": 0.19806672207460788,
|
17 |
+
"train_runtime": 74526.9689,
|
18 |
"train_samples": 61966,
|
19 |
+
"train_samples_per_second": 2.494,
|
20 |
+
"train_steps_per_second": 0.078
|
21 |
}
|
eval_results.json
CHANGED
@@ -1,16 +1,16 @@
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
-
"eval_logits/chosen": -2.
|
4 |
-
"eval_logits/rejected": -2.
|
5 |
-
"eval_logps/chosen": -
|
6 |
-
"eval_logps/rejected": -
|
7 |
-
"eval_loss": 0.
|
8 |
-
"eval_rewards/accuracies": 0.
|
9 |
-
"eval_rewards/chosen": -
|
10 |
-
"eval_rewards/margins":
|
11 |
-
"eval_rewards/rejected": -
|
12 |
-
"eval_runtime":
|
13 |
"eval_samples": 2000,
|
14 |
-
"eval_samples_per_second":
|
15 |
-
"eval_steps_per_second": 0.
|
16 |
}
|
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
+
"eval_logits/chosen": -2.746201753616333,
|
4 |
+
"eval_logits/rejected": -2.8084917068481445,
|
5 |
+
"eval_logps/chosen": -267.275634765625,
|
6 |
+
"eval_logps/rejected": -356.2374267578125,
|
7 |
+
"eval_loss": 0.6013044714927673,
|
8 |
+
"eval_rewards/accuracies": 0.8119999766349792,
|
9 |
+
"eval_rewards/chosen": -1.9595460891723633,
|
10 |
+
"eval_rewards/margins": 6.0504584312438965,
|
11 |
+
"eval_rewards/rejected": -8.010004043579102,
|
12 |
+
"eval_runtime": 278.5463,
|
13 |
"eval_samples": 2000,
|
14 |
+
"eval_samples_per_second": 7.18,
|
15 |
+
"eval_steps_per_second": 0.449
|
16 |
}
|
model-00001-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4943162336
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:30799b1c1ca3ea668f06802ba4955898fa0b5db1587e5631c64a1a254103153d
|
3 |
size 4943162336
|
model-00002-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4999819336
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:77e03a3d268216d271df0c9f332883fd3d0d0c00b86ae884cc9869ebbfaef0d1
|
3 |
size 4999819336
|
model-00003-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4540516344
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:55bcaaaffe978c185af8fbb3177120a0e1b25d723c972a6f82d3b9c330f17e1c
|
3 |
size 4540516344
|
runs/Dec27_17-59-29_babel-5-3/events.out.tfevents.1703718038.babel-5-3.969100.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a803b10ddbdd470fd7c864c27ea27472cfac7e4452455c59ca66d8fd8316a46f
|
3 |
+
size 416385
|
runs/Dec27_17-59-29_babel-5-3/events.out.tfevents.1703792843.babel-5-3.969100.1
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:63a746fb124780ba5a8ab8f708f822d1b02bdbd4536e13ef2915e7add757e694
|
3 |
+
size 828
|
train_results.json
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
-
"train_loss": 0.
|
4 |
-
"train_runtime":
|
5 |
"train_samples": 61966,
|
6 |
-
"train_samples_per_second":
|
7 |
-
"train_steps_per_second": 0.
|
8 |
}
|
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
+
"train_loss": 0.19806672207460788,
|
4 |
+
"train_runtime": 74526.9689,
|
5 |
"train_samples": 61966,
|
6 |
+
"train_samples_per_second": 2.494,
|
7 |
+
"train_steps_per_second": 0.078
|
8 |
}
|
trainer_state.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:49bac97a38d0e5fbdaa25a18765e70e6e313e87821bacfe0b97cc49ca8296f79
|
3 |
+
size 5688
|