lapp0 commited on
Commit
4ecf2ad
·
verified ·
1 Parent(s): b0cc48c

End of training

Browse files
README.md CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 1215.4991
20
- - eval_frwikippl: 5819.5083
21
- - eval_zhwikippl: 20956.7344
22
- - eval_loss: 8772.9277
23
- - eval_runtime: 21.4547
24
- - eval_samples_per_second: 46.61
25
- - eval_steps_per_second: 11.652
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -59,29 +59,29 @@ The following hyperparameters were used during training:
59
  - num_epochs: 1.0
60
 
61
  ### Resource Usage
62
- Peak GPU Memory: 4.5042 GB
63
 
64
  ### Eval-Phase Metrics
65
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
66
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
67
  | **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
68
- | 0 | 0 | 57156.2305 | 56794.7344 | 340312.0625 | 21.3694 | 46.796 | 11.699 | 53180.0820 |
69
- | 500 | 0.0808 | 2484.4006 | 11045.3506 | 11426.8164 | 21.3475 | 46.844 | 11.711 | 43964.9844 |
70
- | 1000 | 0.1616 | 1970.1313 | 8537.9531 | 10323.5195 | 21.3627 | 46.811 | 11.703 | 33413.9570 |
71
- | 1500 | 0.2424 | 1787.3650 | 7996.6372 | 10073.0879 | 21.7084 | 46.065 | 11.516 | 31035.3789 |
72
- | 2000 | 0.3232 | 1657.3257 | 6987.1538 | 9678.4004 | 21.3204 | 46.903 | 11.726 | 25568.5762 |
73
- | 2500 | 0.4040 | 1540.4508 | 6767.0361 | 9425.2803 | 21.6179 | 46.258 | 11.564 | 24918.0391 |
74
- | 3000 | 0.4848 | 1476.4456 | 6392.7534 | 9441.3438 | 21.3757 | 46.782 | 11.696 | 22015.2520 |
75
- | 3500 | 0.5657 | 1410.0809 | 6415.7793 | 9184.1279 | 21.3028 | 46.942 | 11.736 | 22942.6816 |
76
- | 4000 | 0.6465 | 1353.9352 | 6457.0771 | 9045.5684 | 21.3793 | 46.774 | 11.694 | 23314.8477 |
77
- | 4500 | 0.7273 | 1299.9990 | 5976.0591 | 8900.2881 | 21.3265 | 46.89 | 11.723 | 20214.6074 |
78
- | 5000 | 0.8081 | 1277.3837 | 6074.4014 | 8813.1836 | 21.3486 | 46.841 | 11.71 | 21447.9414 |
79
- | 5500 | 0.8889 | 1249.9579 | 6053.8770 | 8707.7764 | 21.3581 | 46.821 | 11.705 | 22251.7031 |
80
- | 6000 | 0.9697 | 1205.8865 | 5761.5308 | 8635.9678 | 21.251 | 47.057 | 11.764 | 20124.375 |
81
- | 6187 | 0.9999 | 1215.4991 | 5819.5083 | 8772.9277 | 21.4547 | 46.61 | 11.652 | 20956.7344 |
82
 
83
  ### Framework versions
84
- - Distily 0.1.0
85
  - Transformers 4.44.0
86
  - Pytorch 2.3.0
87
  - Datasets 2.20.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 1317.8882
20
+ - eval_frwikippl: 6160.0112
21
+ - eval_zhwikippl: 18720.5391
22
+ - eval_loss: 9100.8643
23
+ - eval_runtime: 21.7479
24
+ - eval_samples_per_second: 45.982
25
+ - eval_steps_per_second: 11.495
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
59
  - num_epochs: 1.0
60
 
61
  ### Resource Usage
62
+ Peak GPU Memory: 4.5037 GB
63
 
64
  ### Eval-Phase Metrics
65
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
66
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
67
  | **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
68
+ | 0 | 0 | 56316.9727 | 57063.6406 | 338362.375 | 21.585 | 46.329 | 11.582 | 59895.3867 |
69
+ | 500 | 0.0808 | 2572.3994 | 11417.5068 | 11543.2324 | 21.6004 | 46.295 | 11.574 | 41503.2422 |
70
+ | 1000 | 0.1616 | 2063.0137 | 9092.1670 | 10577.4082 | 21.6631 | 46.161 | 11.54 | 35115.7461 |
71
+ | 1500 | 0.2424 | 1871.0499 | 7938.2212 | 10358.4639 | 21.603 | 46.29 | 11.572 | 27292.1016 |
72
+ | 2000 | 0.3232 | 1695.2686 | 7227.6602 | 9993.9844 | 21.6259 | 46.241 | 11.56 | 23182.9023 |
73
+ | 2500 | 0.4040 | 1612.1370 | 6837.5381 | 9819.5840 | 21.6071 | 46.281 | 11.57 | 20554.8633 |
74
+ | 3000 | 0.4848 | 1560.7086 | 6503.2227 | 9719.8721 | 21.6469 | 46.196 | 11.549 | 18931.7129 |
75
+ | 3500 | 0.5657 | 1508.6647 | 6356.7939 | 9568.1279 | 21.5545 | 46.394 | 11.599 | 18264.9355 |
76
+ | 4000 | 0.6465 | 1453.8334 | 6368.4561 | 9422.2725 | 21.6702 | 46.146 | 11.537 | 18620.8105 |
77
+ | 4500 | 0.7273 | 1410.3822 | 6362.3979 | 9391.8076 | 21.643 | 46.204 | 11.551 | 19780.6777 |
78
+ | 5000 | 0.8081 | 1377.3173 | 6155.8887 | 9252.1602 | 21.6684 | 46.15 | 11.538 | 19119.7227 |
79
+ | 5500 | 0.8889 | 1357.9893 | 6214.3193 | 9214.1436 | 21.8033 | 45.865 | 11.466 | 18457.4102 |
80
+ | 6000 | 0.9697 | 1323.8629 | 6020.0356 | 9141.9521 | 21.6876 | 46.109 | 11.527 | 17406.4805 |
81
+ | 6187 | 0.9999 | 1317.8882 | 6160.0112 | 9100.8643 | 21.7479 | 45.982 | 11.495 | 18720.5391 |
82
 
83
  ### Framework versions
84
+ - Distily 0.2.0
85
  - Transformers 4.44.0
86
  - Pytorch 2.3.0
87
  - Datasets 2.20.0
runs/Aug10_05-37-44_93d6cbb3ad53/events.out.tfevents.1723272569.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2690431ffa6cc4dcc5d737096e31d1998e827abd709d238eeaf8269c9aeb7da7
3
+ size 249