lapp0 commited on
Commit
430ed51
·
verified ·
1 Parent(s): 2f6d6ef

End of training

Browse files
README.md CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 475.6911
20
- - eval_frwikippl: 1883.2976
21
- - eval_zhwikippl: 753.3747
22
- - eval_tinystoriesppl: 508.0900
23
- - eval_loss: 1.6984
24
- - eval_runtime: 21.181
25
- - eval_samples_per_second: 47.212
26
- - eval_steps_per_second: 11.803
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
@@ -48,7 +48,7 @@ More information needed
48
  The following hyperparameters were used during training:
49
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
50
  - train_embeddings: True
51
- - learning_rate: 4e-06
52
  - train_batch_size: 1
53
  - eval_batch_size: 4
54
  - seed: 42
@@ -63,27 +63,27 @@ Peak GPU Memory: 3.9285 GB
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 |
66
- | 0 | 0 | 120078.375 | 1867851235328.0 | 18.7920 | 21.4263 | 46.672 | 11.668 | 72.8770 | 4013754155008.0 |
67
- | 5000 | 0.0505 | 463.2989 | 2220.3191 | 1.7597 | 21.4296 | 46.664 | 11.666 | 465.7518 | 1000.1456 |
68
- | 10000 | 0.1010 | 482.0157 | 1842.5612 | 1.7109 | 21.3947 | 46.741 | 11.685 | 518.0355 | 839.0040 |
69
- | 15000 | 0.1515 | 474.3848 | 1847.8888 | 1.7034 | 21.3589 | 46.819 | 11.705 | 506.2872 | 714.8962 |
70
- | 20000 | 0.2020 | 475.7832 | 1838.8660 | 1.7042 | 21.391 | 46.749 | 11.687 | 508.0481 | 751.2170 |
71
- | 25000 | 0.2525 | 494.6881 | 1854.0813 | 1.7025 | 21.5144 | 46.481 | 11.62 | 541.3933 | 699.7526 |
72
- | 30000 | 0.3030 | 470.2776 | 1916.4137 | 1.7054 | 21.5321 | 46.442 | 11.611 | 500.7306 | 709.5754 |
73
- | 35000 | 0.3535 | 489.6271 | 1885.3552 | 1.7014 | 21.4509 | 46.618 | 11.655 | 532.2521 | 748.5163 |
74
- | 40000 | 0.4040 | 516.1402 | 1929.0752 | 1.7038 | 21.2233 | 47.118 | 11.78 | 576.5936 | 754.3302 |
75
- | 45000 | 0.4545 | 456.2205 | 1928.9390 | 1.7070 | 21.1932 | 47.185 | 11.796 | 478.0193 | 808.4593 |
76
- | 50000 | 0.5051 | 505.5353 | 1946.6123 | 1.7023 | 21.2286 | 47.106 | 11.777 | 557.7241 | 780.4835 |
77
- | 55000 | 0.5556 | 472.1027 | 1931.3862 | 1.7005 | 21.2292 | 47.105 | 11.776 | 503.1787 | 760.6457 |
78
- | 60000 | 0.6061 | 476.9457 | 1870.4763 | 1.6989 | 21.1827 | 47.208 | 11.802 | 511.0809 | 754.1288 |
79
- | 65000 | 0.6566 | 468.1150 | 1897.7445 | 1.6984 | 21.2396 | 47.082 | 11.77 | 499.0363 | 760.3411 |
80
- | 70000 | 0.7071 | 483.1840 | 1883.0992 | 1.6974 | 21.1979 | 47.174 | 11.794 | 521.3865 | 761.0517 |
81
- | 75000 | 0.7576 | 477.2413 | 1875.6208 | 1.6995 | 21.1986 | 47.173 | 11.793 | 511.2922 | 761.1025 |
82
- | 80000 | 0.8081 | 478.3332 | 1883.3641 | 1.6985 | 21.3287 | 46.885 | 11.721 | 513.3676 | 751.1171 |
83
- | 85000 | 0.8586 | 475.8018 | 1880.9122 | 1.6979 | 21.1634 | 47.251 | 11.813 | 508.5522 | 750.0656 |
84
- | 90000 | 0.9091 | 475.8753 | 1881.9717 | 1.6980 | 21.2695 | 47.016 | 11.754 | 508.5944 | 751.9692 |
85
- | 95000 | 0.9596 | 475.6727 | 1882.6350 | 1.6983 | 21.2248 | 47.115 | 11.779 | 507.8802 | 753.2742 |
86
- | 99000 | 1.0 | 475.6911 | 1883.2976 | 1.6984 | 21.181 | 47.212 | 11.803 | 508.0900 | 753.3747 |
87
 
88
  ### Framework versions
89
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 452.9807
20
+ - eval_frwikippl: 741.6703
21
+ - eval_zhwikippl: 169.7969
22
+ - eval_tinystoriesppl: 694.5760
23
+ - eval_loss: 1.2502
24
+ - eval_runtime: 21.1964
25
+ - eval_samples_per_second: 47.178
26
+ - eval_steps_per_second: 11.794
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
 
48
  The following hyperparameters were used during training:
49
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
50
  - train_embeddings: True
51
+ - learning_rate: 1e-05
52
  - train_batch_size: 1
53
  - eval_batch_size: 4
54
  - seed: 42
 
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 |
66
+ | 0 | 0 | 120078.375 | 1867851235328.0 | 18.7920 | 21.1643 | 47.249 | 11.812 | 72.8770 | 4013754155008.0 |
67
+ | 5000 | 0.0505 | 399.8896 | 1364.9200 | 1.5750 | 21.223 | 47.119 | 11.78 | 430.3431 | 486.9932 |
68
+ | 10000 | 0.1010 | 366.0540 | 968.1008 | 1.4975 | 21.2542 | 47.05 | 11.762 | 410.0440 | 300.9413 |
69
+ | 15000 | 0.1515 | 382.6534 | 990.8644 | 1.4377 | 21.1883 | 47.196 | 11.799 | 455.3961 | 243.5069 |
70
+ | 20000 | 0.2020 | 372.0864 | 985.5745 | 1.4590 | 21.2537 | 47.051 | 11.763 | 430.2186 | 317.8063 |
71
+ | 25000 | 0.2525 | 459.8662 | 802.9102 | 1.3109 | 21.2174 | 47.131 | 11.783 | 674.2657 | 183.8540 |
72
+ | 30000 | 0.3030 | 452.4371 | 822.7448 | 1.2777 | 21.2291 | 47.105 | 11.776 | 674.3492 | 162.7067 |
73
+ | 35000 | 0.3535 | 476.7241 | 805.2602 | 1.2741 | 21.2169 | 47.132 | 11.783 | 736.0758 | 174.6150 |
74
+ | 40000 | 0.4040 | 453.2438 | 770.2305 | 1.2733 | 21.1947 | 47.181 | 11.795 | 679.9471 | 163.0870 |
75
+ | 45000 | 0.4545 | 460.5169 | 781.4591 | 1.2687 | 21.2116 | 47.144 | 11.786 | 700.2546 | 183.2052 |
76
+ | 50000 | 0.5051 | 479.0564 | 794.0530 | 1.2632 | 21.229 | 47.105 | 11.776 | 743.4755 | 181.4419 |
77
+ | 55000 | 0.5556 | 471.3993 | 748.4656 | 1.2630 | 21.215 | 47.137 | 11.784 | 731.375 | 172.6117 |
78
+ | 60000 | 0.6061 | 446.4142 | 775.7834 | 1.2687 | 21.1528 | 47.275 | 11.819 | 669.1851 | 164.7928 |
79
+ | 65000 | 0.6566 | 455.8672 | 744.0773 | 1.2538 | 21.2207 | 47.124 | 11.781 | 698.6068 | 164.4469 |
80
+ | 70000 | 0.7071 | 453.5074 | 740.2094 | 1.2513 | 21.3501 | 46.838 | 11.71 | 697.8277 | 168.6457 |
81
+ | 75000 | 0.7576 | 450.4874 | 723.2042 | 1.2535 | 21.2028 | 47.164 | 11.791 | 685.8463 | 167.9272 |
82
+ | 80000 | 0.8081 | 455.6377 | 745.9662 | 1.2523 | 21.2178 | 47.13 | 11.783 | 701.7324 | 170.4892 |
83
+ | 85000 | 0.8586 | 447.3922 | 746.4918 | 1.2509 | 21.2165 | 47.133 | 11.783 | 681.8325 | 168.7976 |
84
+ | 90000 | 0.9091 | 453.0859 | 740.9397 | 1.2505 | 21.1987 | 47.173 | 11.793 | 696.0992 | 169.7290 |
85
+ | 95000 | 0.9596 | 451.3083 | 741.0439 | 1.2504 | 21.5668 | 46.368 | 11.592 | 690.2544 | 169.7969 |
86
+ | 99000 | 1.0 | 452.9807 | 741.6703 | 1.2502 | 21.1964 | 47.178 | 11.794 | 694.5760 | 169.7969 |
87
 
88
  ### Framework versions
89
  - Distily 0.2.0
logs/copy_teacher_modules=_(_lm_head___False)_, learning_rate=1e-05/events.out.tfevents.1724042197.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd831520fb5d5ea62af30665721726716a3ab479121fcd68836bba0d436a8fd9
3
+ size 312