File size: 14,406 Bytes
a17d90e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
license: apache-2.0
base_model: EleutherAI/pythia-12b-deduped
tags:
- generated_from_trainer
model-index:
- name: PE-12b-pythia
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# PE-12b-pythia

This model is a fine-tuned version of [EleutherAI/pythia-12b-deduped](https://huggingface.co/EleutherAI/pythia-12b-deduped) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1421
- Rewards/chosen: 3.5045
- Rewards/rejected: -2.3171
- Rewards/accuracies: 0.9441
- Rewards/margins: 5.8216
- Logps/rejected: -95.5639
- Logps/chosen: -116.1507
- Logits/rejected: -0.4604
- Logits/chosen: -0.4355

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-07
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.8825        | 0.05  | 100  | 0.8872          | 0.1884         | 0.1204           | 0.5056             | 0.0680          | -90.6889       | -122.7830    | -0.5017         | -0.4522       |
| 0.9136        | 0.09  | 200  | 0.8325          | 0.3253         | 0.0714           | 0.5894             | 0.2540          | -90.7870       | -122.5091    | -0.4960         | -0.4447       |
| 0.7507        | 0.14  | 300  | 0.7816          | 0.5741         | 0.2797           | 0.5670             | 0.2944          | -90.3703       | -122.0116    | -0.4909         | -0.4426       |
| 0.6142        | 0.18  | 400  | 0.6435          | 1.0753         | 0.4404           | 0.6369             | 0.6348          | -90.0489       | -121.0092    | -0.4793         | -0.4322       |
| 0.519         | 0.23  | 500  | 0.5196          | 1.7213         | 0.5624           | 0.7430             | 1.1590          | -89.8050       | -119.7171    | -0.4559         | -0.4084       |
| 0.4858        | 0.27  | 600  | 0.4351          | 2.2085         | 0.5923           | 0.7877             | 1.6162          | -89.7450       | -118.7428    | -0.4592         | -0.4138       |
| 0.4048        | 0.32  | 700  | 0.3878          | 2.6105         | 0.5736           | 0.8324             | 2.0369          | -89.7825       | -117.9388    | -0.4398         | -0.3953       |
| 0.3623        | 0.37  | 800  | 0.3383          | 2.7055         | 0.4610           | 0.8520             | 2.2446          | -90.0078       | -117.7487    | -0.4492         | -0.4046       |
| 0.308         | 0.41  | 900  | 0.3145          | 2.9742         | 0.3506           | 0.8520             | 2.6236          | -90.2285       | -117.2114    | -0.4381         | -0.3971       |
| 0.3092        | 0.46  | 1000 | 0.3125          | 3.1541         | 0.2687           | 0.8352             | 2.8854          | -90.3922       | -116.8515    | -0.4276         | -0.3926       |
| 0.2765        | 0.5   | 1100 | 0.2939          | 3.1208         | 0.1475           | 0.8603             | 2.9733          | -90.6347       | -116.9181    | -0.4615         | -0.4216       |
| 0.3058        | 0.55  | 1200 | 0.2772          | 2.9861         | -0.1371          | 0.8771             | 3.1232          | -91.2038       | -117.1875    | -0.4249         | -0.3887       |
| 0.2702        | 0.59  | 1300 | 0.2592          | 3.3217         | -0.0639          | 0.8715             | 3.3856          | -91.0574       | -116.5163    | -0.4497         | -0.4113       |
| 0.2316        | 0.64  | 1400 | 0.2491          | 3.3560         | -0.2934          | 0.8855             | 3.6494          | -91.5165       | -116.4477    | -0.4234         | -0.3869       |
| 0.2344        | 0.68  | 1500 | 0.2506          | 3.2223         | -0.2242          | 0.8687             | 3.4464          | -91.3780       | -116.7152    | -0.4515         | -0.4151       |
| 0.2332        | 0.73  | 1600 | 0.2350          | 3.2137         | -0.4070          | 0.8855             | 3.6207          | -91.7436       | -116.7324    | -0.4299         | -0.3936       |
| 0.2258        | 0.78  | 1700 | 0.2477          | 3.0894         | -0.5590          | 0.8939             | 3.6484          | -92.0476       | -116.9809    | -0.4316         | -0.3960       |
| 0.2526        | 0.82  | 1800 | 0.2277          | 3.2845         | -0.5527          | 0.8771             | 3.8373          | -92.0351       | -116.5907    | -0.4420         | -0.4076       |
| 0.2025        | 0.87  | 1900 | 0.2182          | 3.2061         | -0.8100          | 0.9022             | 4.0160          | -92.5496       | -116.7476    | -0.4319         | -0.3974       |
| 0.2253        | 0.91  | 2000 | 0.2149          | 3.2765         | -0.9756          | 0.9078             | 4.2521          | -92.8809       | -116.6067    | -0.4391         | -0.4023       |
| 0.2084        | 0.96  | 2100 | 0.2223          | 3.1160         | -1.0659          | 0.8939             | 4.1820          | -93.0615       | -116.9277    | -0.4283         | -0.3954       |
| 0.1896        | 1.0   | 2200 | 0.2100          | 3.1835         | -1.0131          | 0.8911             | 4.1966          | -92.9559       | -116.7927    | -0.4517         | -0.4154       |
| 0.2294        | 1.05  | 2300 | 0.2070          | 3.1205         | -1.0873          | 0.8939             | 4.2078          | -93.1043       | -116.9187    | -0.4412         | -0.4051       |
| 0.1897        | 1.1   | 2400 | 0.2011          | 3.1553         | -1.0875          | 0.9050             | 4.2428          | -93.1047       | -116.8492    | -0.4483         | -0.4136       |
| 0.1943        | 1.14  | 2500 | 0.1953          | 3.3317         | -1.2261          | 0.9022             | 4.5578          | -93.3819       | -116.4964    | -0.4488         | -0.4137       |
| 0.1749        | 1.19  | 2600 | 0.1975          | 3.2186         | -1.3232          | 0.8911             | 4.5419          | -93.5761       | -116.7225    | -0.4500         | -0.4160       |
| 0.1881        | 1.23  | 2700 | 0.1838          | 3.3207         | -1.3323          | 0.9274             | 4.6530          | -93.5944       | -116.5184    | -0.4262         | -0.3962       |
| 0.1611        | 1.28  | 2800 | 0.1833          | 3.2881         | -1.3588          | 0.9106             | 4.6469          | -93.6472       | -116.5835    | -0.4404         | -0.4091       |
| 0.1653        | 1.32  | 2900 | 0.1959          | 3.2545         | -1.6143          | 0.9190             | 4.8688          | -94.1584       | -116.6508    | -0.4252         | -0.3996       |
| 0.1613        | 1.37  | 3000 | 0.1779          | 3.3926         | -1.5190          | 0.9218             | 4.9117          | -93.9678       | -116.3744    | -0.4374         | -0.4071       |
| 0.1785        | 1.42  | 3100 | 0.1840          | 3.4053         | -1.6286          | 0.9246             | 5.0339          | -94.1868       | -116.3491    | -0.4280         | -0.3987       |
| 0.1544        | 1.46  | 3200 | 0.1686          | 3.5029         | -1.6389          | 0.9218             | 5.1418          | -94.2075       | -116.1539    | -0.4624         | -0.4309       |
| 0.1492        | 1.51  | 3300 | 0.1706          | 3.2854         | -1.8094          | 0.9330             | 5.0948          | -94.5485       | -116.5889    | -0.4148         | -0.3943       |
| 0.1719        | 1.55  | 3400 | 0.1691          | 3.5148         | -1.7457          | 0.9274             | 5.2605          | -94.4210       | -116.1301    | -0.4542         | -0.4253       |
| 0.1905        | 1.6   | 3500 | 0.1719          | 3.4941         | -1.7454          | 0.9246             | 5.2395          | -94.4204       | -116.1715    | -0.4479         | -0.4189       |
| 0.1354        | 1.64  | 3600 | 0.1749          | 3.5351         | -1.7024          | 0.9106             | 5.2375          | -94.3345       | -116.0895    | -0.4608         | -0.4303       |
| 0.1644        | 1.69  | 3700 | 0.1597          | 3.5736         | -1.6580          | 0.9246             | 5.2316          | -94.2457       | -116.0126    | -0.4469         | -0.4192       |
| 0.1598        | 1.73  | 3800 | 0.1613          | 3.6646         | -1.7035          | 0.9078             | 5.3681          | -94.3367       | -115.8306    | -0.4631         | -0.4349       |
| 0.1337        | 1.78  | 3900 | 0.1583          | 3.5502         | -1.8444          | 0.9134             | 5.3946          | -94.6184       | -116.0593    | -0.4658         | -0.4368       |
| 0.1534        | 1.83  | 4000 | 0.1572          | 3.5076         | -1.9137          | 0.9190             | 5.4213          | -94.7571       | -116.1446    | -0.4610         | -0.4328       |
| 0.1327        | 1.87  | 4100 | 0.1607          | 3.5711         | -1.9143          | 0.9218             | 5.4854          | -94.7583       | -116.0175    | -0.4404         | -0.4153       |
| 0.162         | 1.92  | 4200 | 0.1565          | 3.4852         | -2.0136          | 0.9330             | 5.4988          | -94.9568       | -116.1893    | -0.4641         | -0.4373       |
| 0.1471        | 1.96  | 4300 | 0.1524          | 3.5639         | -1.9766          | 0.9246             | 5.5406          | -94.8830       | -116.0319    | -0.4627         | -0.4338       |
| 0.1333        | 2.01  | 4400 | 0.1418          | 3.6173         | -1.9710          | 0.9162             | 5.5883          | -94.8717       | -115.9251    | -0.4608         | -0.4328       |
| 0.13          | 2.05  | 4500 | 0.1485          | 3.6275         | -1.9865          | 0.9358             | 5.6140          | -94.9027       | -115.9047    | -0.4604         | -0.4319       |
| 0.1311        | 2.1   | 4600 | 0.1503          | 3.4735         | -2.1194          | 0.9134             | 5.5928          | -95.1684       | -116.2128    | -0.4405         | -0.4123       |
| 0.1329        | 2.15  | 4700 | 0.1431          | 3.5793         | -2.1059          | 0.9218             | 5.6852          | -95.1415       | -116.0012    | -0.4519         | -0.4229       |
| 0.1346        | 2.19  | 4800 | 0.1494          | 3.6059         | -2.0642          | 0.9274             | 5.6701          | -95.0581       | -115.9479    | -0.4639         | -0.4332       |
| 0.1462        | 2.24  | 4900 | 0.1455          | 3.4721         | -2.1648          | 0.9218             | 5.6369          | -95.2593       | -116.2156    | -0.4553         | -0.4258       |
| 0.1221        | 2.28  | 5000 | 0.1538          | 3.6293         | -2.1472          | 0.9385             | 5.7764          | -95.2240       | -115.9012    | -0.4525         | -0.4268       |
| 0.1329        | 2.33  | 5100 | 0.1486          | 3.4734         | -2.1778          | 0.9358             | 5.6512          | -95.2853       | -116.2130    | -0.4578         | -0.4301       |
| 0.1284        | 2.37  | 5200 | 0.1527          | 3.4805         | -2.1670          | 0.9078             | 5.6474          | -95.2636       | -116.1988    | -0.4611         | -0.4329       |
| 0.1238        | 2.42  | 5300 | 0.1433          | 3.4570         | -2.1768          | 0.9274             | 5.6338          | -95.2832       | -116.2457    | -0.4451         | -0.4191       |
| 0.1317        | 2.46  | 5400 | 0.1421          | 3.5647         | -2.2232          | 0.9330             | 5.7880          | -95.3761       | -116.0303    | -0.4565         | -0.4342       |
| 0.131         | 2.51  | 5500 | 0.1478          | 3.4211         | -2.2681          | 0.9190             | 5.6892          | -95.4659       | -116.3175    | -0.4444         | -0.4147       |
| 0.1235        | 2.56  | 5600 | 0.1428          | 3.5292         | -2.2798          | 0.9413             | 5.8089          | -95.4892       | -116.1014    | -0.4485         | -0.4234       |
| 0.1122        | 2.6   | 5700 | 0.1445          | 3.6102         | -2.2363          | 0.9330             | 5.8465          | -95.4023       | -115.9393    | -0.4473         | -0.4233       |
| 0.1172        | 2.65  | 5800 | 0.1415          | 3.5813         | -2.1899          | 0.9246             | 5.7712          | -95.3095       | -115.9972    | -0.4648         | -0.4357       |
| 0.1257        | 2.69  | 5900 | 0.1428          | 3.4075         | -2.3047          | 0.9218             | 5.7122          | -95.5390       | -116.3447    | -0.4553         | -0.4269       |
| 0.1441        | 2.74  | 6000 | 0.1426          | 3.4287         | -2.3210          | 0.9190             | 5.7497          | -95.5717       | -116.3024    | -0.4673         | -0.4401       |
| 0.1359        | 2.78  | 6100 | 0.1479          | 3.4833         | -2.2993          | 0.9358             | 5.7826          | -95.5282       | -116.1931    | -0.4409         | -0.4173       |
| 0.1332        | 2.83  | 6200 | 0.1442          | 3.4741         | -2.2726          | 0.9330             | 5.7466          | -95.4748       | -116.2116    | -0.4512         | -0.4262       |
| 0.1454        | 2.88  | 6300 | 0.1397          | 3.4410         | -2.2911          | 0.9358             | 5.7320          | -95.5118       | -116.2778    | -0.4604         | -0.4355       |
| 0.1355        | 2.92  | 6400 | 0.1471          | 3.3740         | -2.3739          | 0.9330             | 5.7479          | -95.6775       | -116.4117    | -0.4473         | -0.4225       |
| 0.1114        | 2.97  | 6500 | 0.1397          | 3.4854         | -2.3222          | 0.9302             | 5.8076          | -95.5740       | -116.1889    | -0.4595         | -0.4345       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1