lapp0 commited on
Commit
b27a60b
·
verified ·
1 Parent(s): 69999e0

End of training

Browse files
README.md CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 3656.4097
20
- - eval_frwikippl: 998935.5
21
- - eval_zhwikippl: 3651930.25
22
- - eval_tinystoriesppl: 853.0071
23
- - eval_loss: 4.7666
24
- - eval_runtime: 21.3363
25
- - eval_samples_per_second: 46.868
26
- - eval_steps_per_second: 11.717
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
@@ -48,7 +48,7 @@ More information needed
48
  The following hyperparameters were used during training:
49
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
50
  - train_embeddings: True
51
- - learning_rate: 1e-06
52
  - train_batch_size: 1
53
  - eval_batch_size: 4
54
  - seed: 42
@@ -63,27 +63,27 @@ Peak GPU Memory: 3.9285 GB
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 |
66
- | 0 | 0 | 154372.75 | 3250976718848.0 | 19.5745 | 21.9663 | 45.524 | 11.381 | 81.0696 | 3865309085696.0 |
67
- | 5000 | 0.0505 | 3591.7734 | 994582.8125 | 4.7672 | 21.3712 | 46.792 | 11.698 | 830.4290 | 3622816.25 |
68
- | 10000 | 0.1010 | 3615.9900 | 994303.0 | 4.7657 | 21.1745 | 47.227 | 11.807 | 836.8727 | 3641715.75 |
69
- | 15000 | 0.1515 | 3671.0986 | 1001083.1875 | 4.7665 | 21.6571 | 46.174 | 11.544 | 857.9221 | 3659244.25 |
70
- | 20000 | 0.2020 | 3636.2166 | 997810.125 | 4.7657 | 21.4908 | 46.532 | 11.633 | 844.9362 | 3650955.25 |
71
- | 25000 | 0.2525 | 3617.9526 | 996265.0 | 4.7661 | 21.7815 | 45.91 | 11.478 | 837.2188 | 3639771.25 |
72
- | 30000 | 0.3030 | 3649.9016 | 997669.25 | 4.7663 | 21.6219 | 46.249 | 11.562 | 851.5274 | 3646090.75 |
73
- | 35000 | 0.3535 | 3636.2166 | 998372.625 | 4.7663 | 21.533 | 46.44 | 11.61 | 845.2155 | 3649983.75 |
74
- | 40000 | 0.4040 | 3632.8369 | 998090.875 | 4.7660 | 21.5429 | 46.419 | 11.605 | 843.6451 | 3651930.25 |
75
- | 45000 | 0.4545 | 3662.2236 | 1001083.1875 | 4.7662 | 21.5033 | 46.505 | 11.626 | 855.9031 | 3661199.0 |
76
- | 50000 | 0.5051 | 3627.9138 | 997810.125 | 4.7659 | 21.4372 | 46.648 | 11.662 | 841.2076 | 3650955.25 |
77
- | 55000 | 0.5556 | 3650.6084 | 999075.5625 | 4.7662 | 21.5382 | 46.429 | 11.607 | 850.9644 | 3649983.75 |
78
- | 60000 | 0.6061 | 3654.9941 | 999216.5625 | 4.7663 | 21.743 | 45.992 | 11.498 | 852.0205 | 3650955.25 |
79
- | 65000 | 0.6566 | 3649.6199 | 998090.875 | 4.7664 | 21.5302 | 46.446 | 11.612 | 851.1754 | 3649009.25 |
80
- | 70000 | 0.7071 | 3655.5625 | 998935.5 | 4.7663 | 21.6494 | 46.191 | 11.548 | 852.5842 | 3650955.25 |
81
- | 75000 | 0.7576 | 3652.7292 | 999497.75 | 4.7662 | 21.745 | 45.988 | 11.497 | 851.7739 | 3649009.25 |
82
- | 80000 | 0.8081 | 3649.6199 | 998231.75 | 4.7664 | 21.5478 | 46.408 | 11.602 | 851.4572 | 3649983.75 |
83
- | 85000 | 0.8586 | 3654.9941 | 998794.5 | 4.7664 | 21.5285 | 46.45 | 11.613 | 852.6545 | 3649983.75 |
84
- | 90000 | 0.9091 | 3658.1118 | 998653.5625 | 4.7664 | 21.5544 | 46.394 | 11.599 | 853.7831 | 3650955.25 |
85
- | 95000 | 0.9596 | 3656.9780 | 999497.75 | 4.7665 | 21.3927 | 46.745 | 11.686 | 852.7249 | 3649983.75 |
86
- | 99000 | 1.0 | 3656.4097 | 998935.5 | 4.7666 | 21.3363 | 46.868 | 11.717 | 853.0071 | 3651930.25 |
87
 
88
  ### Framework versions
89
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 475.6911
20
+ - eval_frwikippl: 1883.2976
21
+ - eval_zhwikippl: 753.3747
22
+ - eval_tinystoriesppl: 508.0900
23
+ - eval_loss: 1.6984
24
+ - eval_runtime: 21.181
25
+ - eval_samples_per_second: 47.212
26
+ - eval_steps_per_second: 11.803
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
 
48
  The following hyperparameters were used during training:
49
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
50
  - train_embeddings: True
51
+ - learning_rate: 4e-06
52
  - train_batch_size: 1
53
  - eval_batch_size: 4
54
  - seed: 42
 
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 |
66
+ | 0 | 0 | 120078.375 | 1867851235328.0 | 18.7920 | 21.4263 | 46.672 | 11.668 | 72.8770 | 4013754155008.0 |
67
+ | 5000 | 0.0505 | 463.2989 | 2220.3191 | 1.7597 | 21.4296 | 46.664 | 11.666 | 465.7518 | 1000.1456 |
68
+ | 10000 | 0.1010 | 482.0157 | 1842.5612 | 1.7109 | 21.3947 | 46.741 | 11.685 | 518.0355 | 839.0040 |
69
+ | 15000 | 0.1515 | 474.3848 | 1847.8888 | 1.7034 | 21.3589 | 46.819 | 11.705 | 506.2872 | 714.8962 |
70
+ | 20000 | 0.2020 | 475.7832 | 1838.8660 | 1.7042 | 21.391 | 46.749 | 11.687 | 508.0481 | 751.2170 |
71
+ | 25000 | 0.2525 | 494.6881 | 1854.0813 | 1.7025 | 21.5144 | 46.481 | 11.62 | 541.3933 | 699.7526 |
72
+ | 30000 | 0.3030 | 470.2776 | 1916.4137 | 1.7054 | 21.5321 | 46.442 | 11.611 | 500.7306 | 709.5754 |
73
+ | 35000 | 0.3535 | 489.6271 | 1885.3552 | 1.7014 | 21.4509 | 46.618 | 11.655 | 532.2521 | 748.5163 |
74
+ | 40000 | 0.4040 | 516.1402 | 1929.0752 | 1.7038 | 21.2233 | 47.118 | 11.78 | 576.5936 | 754.3302 |
75
+ | 45000 | 0.4545 | 456.2205 | 1928.9390 | 1.7070 | 21.1932 | 47.185 | 11.796 | 478.0193 | 808.4593 |
76
+ | 50000 | 0.5051 | 505.5353 | 1946.6123 | 1.7023 | 21.2286 | 47.106 | 11.777 | 557.7241 | 780.4835 |
77
+ | 55000 | 0.5556 | 472.1027 | 1931.3862 | 1.7005 | 21.2292 | 47.105 | 11.776 | 503.1787 | 760.6457 |
78
+ | 60000 | 0.6061 | 476.9457 | 1870.4763 | 1.6989 | 21.1827 | 47.208 | 11.802 | 511.0809 | 754.1288 |
79
+ | 65000 | 0.6566 | 468.1150 | 1897.7445 | 1.6984 | 21.2396 | 47.082 | 11.77 | 499.0363 | 760.3411 |
80
+ | 70000 | 0.7071 | 483.1840 | 1883.0992 | 1.6974 | 21.1979 | 47.174 | 11.794 | 521.3865 | 761.0517 |
81
+ | 75000 | 0.7576 | 477.2413 | 1875.6208 | 1.6995 | 21.1986 | 47.173 | 11.793 | 511.2922 | 761.1025 |
82
+ | 80000 | 0.8081 | 478.3332 | 1883.3641 | 1.6985 | 21.3287 | 46.885 | 11.721 | 513.3676 | 751.1171 |
83
+ | 85000 | 0.8586 | 475.8018 | 1880.9122 | 1.6979 | 21.1634 | 47.251 | 11.813 | 508.5522 | 750.0656 |
84
+ | 90000 | 0.9091 | 475.8753 | 1881.9717 | 1.6980 | 21.2695 | 47.016 | 11.754 | 508.5944 | 751.9692 |
85
+ | 95000 | 0.9596 | 475.6727 | 1882.6350 | 1.6983 | 21.2248 | 47.115 | 11.779 | 507.8802 | 753.2742 |
86
+ | 99000 | 1.0 | 475.6911 | 1883.2976 | 1.6984 | 21.181 | 47.212 | 11.803 | 508.0900 | 753.3747 |
87
 
88
  ### Framework versions
89
  - Distily 0.2.0
logs/copy_teacher_modules=_(_lm_head___False)_, learning_rate=4e-06/events.out.tfevents.1724029122.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a699eff9728784cede2f3df6cf2bdf2d44ca7e9ce195c85c3645578323c09c7f
3
+ size 312