lapp0 commited on
Commit
6f853a5
·
verified ·
1 Parent(s): 58277b2

End of training

Browse files
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  base_model: gpt2
3
  library_name: Distily
 
4
  tags:
5
  - generated_from_trainer
6
  model-index:
@@ -15,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 300.6556
19
- - eval_frwikippl: 1956.3597
20
- - eval_zhwikippl: 1863.9412
21
- - eval_loss: 1.5080
22
- - eval_runtime: 36.2633
23
- - eval_samples_per_second: 55.152
24
- - eval_steps_per_second: 6.894
25
 
26
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
27
  should probably proofread and complete it, then remove this comment.
@@ -44,7 +45,7 @@ More information needed
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
- - distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), hs_weight=0.2, hs_loss_fn=(fn:jsd_loss()), attn_weight=0, attn_loss_fn=(fn:soft_mse_loss()))
48
  - train_embeddings: True
49
  - learning_rate: 4e-05
50
  - train_batch_size: 8
@@ -61,32 +62,32 @@ Peak GPU Memory: 8.0903 GB
61
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
62
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
63
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
64
- | 0 | 0 | 58489.7812 | 56515.125 | 23.3780 | 36.5332 | 54.745 | 6.843 | 57894.0352 |
65
- | 1000 | 0.0404 | 1043.9308 | 5520.7402 | 2.3618 | 36.284 | 55.121 | 6.89 | 18612.1113 |
66
- | 2000 | 0.0808 | 741.2775 | 3967.9961 | 2.0659 | 36.0061 | 55.546 | 6.943 | 5552.3940 |
67
- | 3000 | 0.1212 | 605.6614 | 3446.1201 | 1.9370 | 36.1801 | 55.279 | 6.91 | 3337.375 |
68
- | 4000 | 0.1616 | 522.5402 | 3105.9614 | 1.8315 | 36.1309 | 55.354 | 6.919 | 2158.2932 |
69
- | 5000 | 0.2020 | 457.7764 | 2678.5098 | 1.7492 | 36.1083 | 55.389 | 6.924 | 2813.0127 |
70
- | 6000 | 0.2424 | 401.6627 | 2722.6836 | 1.6794 | 36.1519 | 55.322 | 6.915 | 1611.4553 |
71
- | 7000 | 0.2828 | 354.7337 | 2500.7419 | 1.6117 | 36.2282 | 55.206 | 6.901 | 2297.4802 |
72
- | 8000 | 0.3232 | 326.1971 | 2137.8101 | 1.5573 | 36.0207 | 55.524 | 6.94 | 1784.0629 |
73
- | 9000 | 0.3636 | 300.6556 | 1956.3597 | 1.5080 | 36.2633 | 55.152 | 6.894 | 1863.9412 |
74
- | 10000 | 0.4040 | 277.5233 | 1719.5438 | 1.4690 | 36.6308 | 54.599 | 6.825 | 1311.5671 |
75
- | 11000 | 0.4444 | 259.9179 | 1497.8167 | 1.4308 | 36.3181 | 55.069 | 6.884 | 792.9883 |
76
- | 12000 | 0.4848 | 245.9564 | 1497.3939 | 1.3959 | 36.4046 | 54.938 | 6.867 | 1003.8881 |
77
- | 13000 | 0.5253 | 229.0869 | 1452.4731 | 1.3629 | 36.6598 | 54.556 | 6.819 | 1079.2407 |
78
- | 14000 | 0.5657 | 217.4221 | 1300.0883 | 1.3298 | 36.152 | 55.322 | 6.915 | 1190.3823 |
79
- | 15000 | 0.6061 | 205.3283 | 1161.2318 | 1.3009 | 36.1689 | 55.296 | 6.912 | 949.3873 |
80
- | 16000 | 0.6465 | 198.7860 | 1095.5355 | 1.2807 | 36.2562 | 55.163 | 6.895 | 1139.3685 |
81
- | 17000 | 0.6869 | 192.0938 | 1026.8737 | 1.2628 | 36.298 | 55.099 | 6.887 | 827.6089 |
82
- | 18000 | 0.7273 | 182.0580 | 986.5624 | 1.2441 | 36.1201 | 55.371 | 6.921 | 1010.3434 |
83
- | 19000 | 0.7677 | 178.0731 | 975.4955 | 1.2283 | 36.1677 | 55.298 | 6.912 | 872.3183 |
84
- | 20000 | 0.8081 | 175.3561 | 970.5560 | 1.2150 | 36.1615 | 55.307 | 6.913 | 865.3570 |
85
- | 21000 | 0.8485 | 171.6644 | 930.0918 | 1.2089 | 35.9764 | 55.592 | 6.949 | 832.4859 |
86
- | 22000 | 0.8889 | 168.4032 | 871.8605 | 1.1983 | 35.8999 | 55.71 | 6.964 | 733.7902 |
87
- | 23000 | 0.9293 | 167.6074 | 855.1790 | 1.1917 | 35.9055 | 55.702 | 6.963 | 772.9152 |
88
- | 24000 | 0.9697 | 166.0399 | 822.7090 | 1.1858 | 35.9815 | 55.584 | 6.948 | 620.6498 |
89
- | 24750 | 1.0 | 162.6707 | 952.2545 | 1.1803 | 36.0128 | 55.536 | 6.942 | 582.5044 |
90
 
91
  ### Framework versions
92
  - Distily 0.2.0
 
1
  ---
2
  base_model: gpt2
3
  library_name: Distily
4
+ license: mit
5
  tags:
6
  - generated_from_trainer
7
  model-index:
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 234.9056
20
+ - eval_frwikippl: 1372.0399
21
+ - eval_zhwikippl: 581.4163
22
+ - eval_loss: 1.3616
23
+ - eval_runtime: 35.2036
24
+ - eval_samples_per_second: 56.812
25
+ - eval_steps_per_second: 7.102
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), hs_weight=0.2, hs_loss_fn=(fn:kl_divergence_loss()), attn_weight=0, attn_loss_fn=(fn:soft_mse_loss()))
49
  - train_embeddings: True
50
  - learning_rate: 4e-05
51
  - train_batch_size: 8
 
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
65
+ | 0 | 0 | 54069.2930 | 57285.3438 | 7.1467 | 35.2037 | 56.812 | 7.102 | 54227.1016 |
66
+ | 1000 | 0.0404 | 835.9614 | 5114.5200 | 2.0979 | 35.1654 | 56.874 | 7.109 | 20046.5859 |
67
+ | 2000 | 0.0808 | 564.0770 | 3427.2224 | 1.8740 | 35.236 | 56.76 | 7.095 | 2653.2007 |
68
+ | 3000 | 0.1212 | 453.5656 | 2948.4802 | 1.7420 | 35.0308 | 57.093 | 7.137 | 1432.7913 |
69
+ | 4000 | 0.1616 | 389.6504 | 2661.5664 | 1.6437 | 35.0141 | 57.12 | 7.14 | 1010.8831 |
70
+ | 5000 | 0.2020 | 338.8218 | 2219.8342 | 1.5658 | 35.0667 | 57.034 | 7.129 | 954.4720 |
71
+ | 6000 | 0.2424 | 299.0258 | 1876.3831 | 1.4980 | 35.0748 | 57.021 | 7.128 | 1165.2183 |
72
+ | 7000 | 0.2828 | 273.8413 | 1630.5182 | 1.4478 | 35.1356 | 56.922 | 7.115 | 1019.0151 |
73
+ | 8000 | 0.3232 | 252.0468 | 1444.3036 | 1.3992 | 35.0678 | 57.032 | 7.129 | 876.0536 |
74
+ | 9000 | 0.3636 | 234.9056 | 1372.0399 | 1.3616 | 35.2036 | 56.812 | 7.102 | 581.4163 |
75
+ | 10000 | 0.4040 | 221.7531 | 1324.3276 | 1.3253 | 35.1731 | 56.862 | 7.108 | 536.0763 |
76
+ | 11000 | 0.4444 | 205.0097 | 1190.9154 | 1.2854 | 35.2444 | 56.747 | 7.093 | 729.7837 |
77
+ | 12000 | 0.4848 | 192.3177 | 1137.5690 | 1.2537 | 35.9424 | 55.645 | 6.956 | 630.9283 |
78
+ | 13000 | 0.5253 | 178.7797 | 1029.4834 | 1.2186 | 35.669 | 56.071 | 7.009 | 711.3105 |
79
+ | 14000 | 0.5657 | 171.1587 | 935.8806 | 1.1898 | 35.6704 | 56.069 | 7.009 | 608.0168 |
80
+ | 15000 | 0.6061 | 163.3289 | 894.3995 | 1.1697 | 35.7337 | 55.97 | 6.996 | 592.7048 |
81
+ | 16000 | 0.6465 | 161.3999 | 839.2896 | 1.1530 | 35.726 | 55.982 | 6.998 | 561.4263 |
82
+ | 17000 | 0.6869 | 153.9090 | 816.5835 | 1.1376 | 35.6037 | 56.174 | 7.022 | 552.1324 |
83
+ | 18000 | 0.7273 | 153.1935 | 798.4223 | 1.1286 | 38.2137 | 52.337 | 6.542 | 561.0514 |
84
+ | 19000 | 0.7677 | 149.5150 | 771.6359 | 1.1168 | 35.6863 | 56.044 | 7.005 | 403.5265 |
85
+ | 20000 | 0.8081 | 145.0661 | 746.0110 | 1.1048 | 35.6993 | 56.024 | 7.003 | 399.0254 |
86
+ | 21000 | 0.8485 | 143.9999 | 710.9376 | 1.0945 | 35.6597 | 56.086 | 7.011 | 650.4333 |
87
+ | 22000 | 0.8889 | 142.8749 | 695.1750 | 1.0892 | 35.7386 | 55.962 | 6.995 | 397.0057 |
88
+ | 23000 | 0.9293 | 140.7277 | 699.8470 | 1.0831 | 35.4866 | 56.359 | 7.045 | 324.7243 |
89
+ | 24000 | 0.9697 | 138.0864 | 705.5446 | 1.0736 | 35.6884 | 56.041 | 7.005 | 355.2115 |
90
+ | 24750 | 1.0 | 137.5085 | 729.4709 | 1.0760 | 35.8247 | 55.827 | 6.978 | 660.5873 |
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
logs/distillation_objective=MultiObjective(logits_weight_1__logits_loss_fn_(fn_kl_divergence_loss())__hs_weight_0.2__hs_loss_fn_(fn_kl_divergence_loss())__attn_weight_0__attn_loss_fn_(fn_soft_mse_loss()))/events.out.tfevents.1723642674.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34e963fd408db13313bbcc16a8abe43d7f22e52ae6afc3a39b02a9816340dddd
3
+ size 253