lapp0 commited on
Commit
82696e3
·
verified ·
1 Parent(s): dddd56e

End of training

Browse files
README.md CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 213.1260
20
- - eval_frwikippl: 1238.3538
21
- - eval_zhwikippl: 689.7033
22
- - eval_loss: 1.2684
23
- - eval_runtime: 33.9389
24
- - eval_samples_per_second: 58.929
25
- - eval_steps_per_second: 7.366
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -45,7 +45,7 @@ More information needed
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0, activations_loss_fn=(fn:mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:mse_loss()))
49
  - train_embeddings: True
50
  - learning_rate: 4e-05
51
  - train_batch_size: 8
@@ -56,38 +56,38 @@ The following hyperparameters were used during training:
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
- Peak GPU Memory: 7.9371 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
65
- | 0 | 0 | 57983.2695 | 56826.7539 | 5.9504 | 33.9223 | 58.958 | 7.37 | 51544.0508 |
66
- | 1000 | 0.0404 | 716.3218 | 4663.2852 | 1.9522 | 34.1014 | 58.649 | 7.331 | 17271.0391 |
67
- | 2000 | 0.0808 | 512.1357 | 3224.2202 | 1.7690 | 34.1187 | 58.619 | 7.327 | 2109.2849 |
68
- | 3000 | 0.1212 | 418.9938 | 2658.5667 | 1.6652 | 34.1292 | 58.601 | 7.325 | 1129.3704 |
69
- | 4000 | 0.1616 | 367.4342 | 2491.9417 | 1.5763 | 34.0919 | 58.665 | 7.333 | 798.7274 |
70
- | 5000 | 0.2020 | 317.3523 | 1897.4025 | 1.4963 | 33.965 | 58.884 | 7.361 | 962.9218 |
71
- | 6000 | 0.2424 | 282.9857 | 1585.8464 | 1.4222 | 33.9768 | 58.864 | 7.358 | 852.0554 |
72
- | 7000 | 0.2828 | 251.4994 | 1421.8730 | 1.3623 | 33.9388 | 58.93 | 7.366 | 753.7527 |
73
- | 8000 | 0.3232 | 229.7460 | 1314.6521 | 1.3137 | 34.0289 | 58.773 | 7.347 | 729.5888 |
74
- | 9000 | 0.3636 | 213.1260 | 1238.3538 | 1.2684 | 33.9389 | 58.929 | 7.366 | 689.7033 |
75
- | 10000 | 0.4040 | 197.5243 | 1147.7201 | 1.2172 | 34.1028 | 58.646 | 7.331 | 761.6445 |
76
- | 11000 | 0.4444 | 178.5023 | 1065.9717 | 1.1681 | 34.111 | 58.632 | 7.329 | 697.0179 |
77
- | 12000 | 0.4848 | 164.3850 | 941.9713 | 1.1267 | 34.1042 | 58.644 | 7.33 | 722.8970 |
78
- | 13000 | 0.5253 | 157.2920 | 871.0618 | 1.0965 | 34.1353 | 58.59 | 7.324 | 484.9227 |
79
- | 14000 | 0.5657 | 150.8093 | 806.3426 | 1.0674 | 34.0619 | 58.717 | 7.34 | 539.5954 |
80
- | 15000 | 0.6061 | 143.2526 | 816.5259 | 1.0499 | 34.2668 | 58.366 | 7.296 | 509.8925 |
81
- | 16000 | 0.6465 | 139.8671 | 715.0598 | 1.0314 | 34.0375 | 58.759 | 7.345 | 426.2927 |
82
- | 17000 | 0.6869 | 134.8648 | 739.3088 | 1.0151 | 34.0663 | 58.709 | 7.339 | 458.1682 |
83
- | 18000 | 0.7273 | 132.5907 | 675.8909 | 1.0007 | 33.9807 | 58.857 | 7.357 | 348.7257 |
84
- | 19000 | 0.7677 | 129.5074 | 665.1128 | 0.9937 | 34.017 | 58.794 | 7.349 | 350.5464 |
85
- | 20000 | 0.8081 | 127.9778 | 683.8963 | 0.9837 | 33.9292 | 58.946 | 7.368 | 395.9997 |
86
- | 21000 | 0.8485 | 125.7319 | 659.5090 | 0.9754 | 33.985 | 58.849 | 7.356 | 518.3367 |
87
- | 22000 | 0.8889 | 124.8950 | 691.0702 | 0.9696 | 34.2015 | 58.477 | 7.31 | 610.1314 |
88
- | 23000 | 0.9293 | 123.7751 | 644.4776 | 0.9625 | 34.1656 | 58.538 | 7.317 | 321.7459 |
89
- | 24000 | 0.9697 | 122.1613 | 658.5797 | 0.9586 | 33.975 | 58.867 | 7.358 | 353.6970 |
90
- | 24750 | 1.0 | 119.9802 | 652.2029 | 0.9537 | 34.2146 | 58.455 | 7.307 | 339.4447 |
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 433.0859
20
+ - eval_frwikippl: 2823.5620
21
+ - eval_zhwikippl: 4932.8379
22
+ - eval_loss: 21.1035
23
+ - eval_runtime: 34.4485
24
+ - eval_samples_per_second: 58.058
25
+ - eval_steps_per_second: 7.257
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0.1, activations_loss_fn=(fn:mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:mse_loss()))
49
  - train_embeddings: True
50
  - learning_rate: 4e-05
51
  - train_batch_size: 8
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
+ Peak GPU Memory: 8.0893 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
65
+ | 0 | 0 | 54069.2930 | 57285.3438 | 69.6280 | 34.3114 | 58.29 | 7.286 | 54227.1016 |
66
+ | 1000 | 0.0404 | 1149.4497 | 6758.9292 | 22.9270 | 34.3626 | 58.203 | 7.275 | 55191.4258 |
67
+ | 2000 | 0.0808 | 848.3209 | 5094.3662 | 22.2020 | 34.3795 | 58.174 | 7.272 | 14284.0166 |
68
+ | 3000 | 0.1212 | 700.4797 | 4480.8540 | 21.8288 | 34.371 | 58.189 | 7.274 | 7045.9990 |
69
+ | 4000 | 0.1616 | 615.9059 | 3635.8176 | 21.5565 | 34.4355 | 58.08 | 7.26 | 3316.0488 |
70
+ | 5000 | 0.2020 | 556.0313 | 3492.5959 | 21.4455 | 34.3262 | 58.265 | 7.283 | 4788.7505 |
71
+ | 6000 | 0.2424 | 528.5394 | 3328.1577 | 21.2810 | 34.3681 | 58.193 | 7.274 | 3058.2744 |
72
+ | 7000 | 0.2828 | 479.2375 | 2988.6665 | 21.2197 | 34.3863 | 58.163 | 7.27 | 3689.9192 |
73
+ | 8000 | 0.3232 | 448.9053 | 2847.9541 | 21.0785 | 34.5149 | 57.946 | 7.243 | 1743.5521 |
74
+ | 9000 | 0.3636 | 433.0859 | 2823.5620 | 21.1035 | 34.4485 | 58.058 | 7.257 | 4932.8379 |
75
+ | 10000 | 0.4040 | 423.8369 | 2843.9414 | 21.0105 | 34.4298 | 58.089 | 7.261 | 3959.4795 |
76
+ | 11000 | 0.4444 | 394.3074 | 2524.8374 | 20.9575 | 34.5178 | 57.941 | 7.243 | 6243.0879 |
77
+ | 12000 | 0.4848 | 385.4673 | 2595.5920 | 20.9185 | 34.4535 | 58.049 | 7.256 | 17321.8613 |
78
+ | 13000 | 0.5253 | 369.9537 | 2477.9255 | 20.8475 | 34.4953 | 57.979 | 7.247 | 2443.6860 |
79
+ | 14000 | 0.5657 | 358.8618 | 2519.8567 | 20.7897 | 34.9016 | 57.304 | 7.163 | 3639.9983 |
80
+ | 15000 | 0.6061 | 343.0577 | 2395.4692 | 20.7710 | 34.3143 | 58.285 | 7.286 | 1816.2738 |
81
+ | 16000 | 0.6465 | 343.8312 | 2195.5515 | 20.7428 | 34.184 | 58.507 | 7.313 | 14709.8760 |
82
+ | 17000 | 0.6869 | 336.7496 | 2234.2798 | 20.7590 | 34.4691 | 58.023 | 7.253 | 6489.5991 |
83
+ | 18000 | 0.7273 | 338.3747 | 2191.5310 | 20.6583 | 34.4634 | 58.033 | 7.254 | 2819.0298 |
84
+ | 19000 | 0.7677 | 324.3280 | 2071.9238 | 20.6345 | 34.4307 | 58.088 | 7.261 | 3877.8486 |
85
+ | 20000 | 0.8081 | 315.1911 | 2056.7864 | 20.5710 | 34.2186 | 58.448 | 7.306 | 3151.9771 |
86
+ | 21000 | 0.8485 | 315.4604 | 2161.1489 | 20.5432 | 34.5086 | 57.957 | 7.245 | 3105.1853 |
87
+ | 22000 | 0.8889 | 324.6304 | 1950.2999 | 20.6125 | 34.2565 | 58.383 | 7.298 | 2055.8921 |
88
+ | 23000 | 0.9293 | 313.9452 | 1958.0153 | 20.5900 | 34.5413 | 57.902 | 7.238 | 4405.8896 |
89
+ | 24000 | 0.9697 | 311.3475 | 1918.9283 | 20.5405 | 34.2718 | 58.357 | 7.295 | 11800.9756 |
90
+ | 24750 | 1.0 | 303.2348 | 1956.3597 | 20.4700 | 34.3296 | 58.259 | 7.282 | 15104.0020 |
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
logs/distillation_objective=MultiObjective(logits_weight_1__logits_loss_fn_(fn_kl_divergence_loss())__activations_weight_0.1__activations_loss_fn_(fn_mse_loss())__attentions_weight_0__attentions_loss_fn_(f/events.out.tfevents.1723454082.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4959a9e5649fc069aa3e5f9ccef01630bd6f0527a058567ebd3aaf8c73560990
3
+ size 253