End of training

Browse files

Files changed (2) hide show

README.md +23 -23
runs/Aug10_05-37-44_93d6cbb3ad53/events.out.tfevents.1723272569.93d6cbb3ad53 +3 -0

README.md CHANGED Viewed

@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 1215.4991
-- eval_frwikippl: 5819.5083
-- eval_zhwikippl: 20956.7344
-- eval_loss: 8772.9277
-- eval_runtime: 21.4547
-- eval_samples_per_second: 46.61
-- eval_steps_per_second: 11.652
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -59,29 +59,29 @@ The following hyperparameters were used during training:
 - num_epochs: 1.0
 ### Resource Usage
-Peak GPU Memory: 4.5042 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 30.2385 | 57.2728 |  |  |  |  | 18.1772 |
-| 0 | 0 | 57156.2305 | 56794.7344 | 340312.0625 | 21.3694 | 46.796 | 11.699 | 53180.0820 |
-| 500 | 0.0808 | 2484.4006 | 11045.3506 | 11426.8164 | 21.3475 | 46.844 | 11.711 | 43964.9844 |
-| 1000 | 0.1616 | 1970.1313 | 8537.9531 | 10323.5195 | 21.3627 | 46.811 | 11.703 | 33413.9570 |
-| 1500 | 0.2424 | 1787.3650 | 7996.6372 | 10073.0879 | 21.7084 | 46.065 | 11.516 | 31035.3789 |
-| 2000 | 0.3232 | 1657.3257 | 6987.1538 | 9678.4004 | 21.3204 | 46.903 | 11.726 | 25568.5762 |
-| 2500 | 0.4040 | 1540.4508 | 6767.0361 | 9425.2803 | 21.6179 | 46.258 | 11.564 | 24918.0391 |
-| 3000 | 0.4848 | 1476.4456 | 6392.7534 | 9441.3438 | 21.3757 | 46.782 | 11.696 | 22015.2520 |
-| 3500 | 0.5657 | 1410.0809 | 6415.7793 | 9184.1279 | 21.3028 | 46.942 | 11.736 | 22942.6816 |
-| 4000 | 0.6465 | 1353.9352 | 6457.0771 | 9045.5684 | 21.3793 | 46.774 | 11.694 | 23314.8477 |
-| 4500 | 0.7273 | 1299.9990 | 5976.0591 | 8900.2881 | 21.3265 | 46.89 | 11.723 | 20214.6074 |
-| 5000 | 0.8081 | 1277.3837 | 6074.4014 | 8813.1836 | 21.3486 | 46.841 | 11.71 | 21447.9414 |
-| 5500 | 0.8889 | 1249.9579 | 6053.8770 | 8707.7764 | 21.3581 | 46.821 | 11.705 | 22251.7031 |
-| 6000 | 0.9697 | 1205.8865 | 5761.5308 | 8635.9678 | 21.251 | 47.057 | 11.764 | 20124.375 |
-| 6187 | 0.9999 | 1215.4991 | 5819.5083 | 8772.9277 | 21.4547 | 46.61 | 11.652 | 20956.7344 |
 ### Framework versions
-- Distily 0.1.0
 - Transformers 4.44.0
 - Pytorch 2.3.0
 - Datasets 2.20.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 1317.8882
+- eval_frwikippl: 6160.0112
+- eval_zhwikippl: 18720.5391
+- eval_loss: 9100.8643
+- eval_runtime: 21.7479
+- eval_samples_per_second: 45.982
+- eval_steps_per_second: 11.495
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 - num_epochs: 1.0
 ### Resource Usage
+Peak GPU Memory: 4.5037 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 30.2385 | 57.2728 |  |  |  |  | 18.1772 |
+| 0 | 0 | 56316.9727 | 57063.6406 | 338362.375 | 21.585 | 46.329 | 11.582 | 59895.3867 |
+| 500 | 0.0808 | 2572.3994 | 11417.5068 | 11543.2324 | 21.6004 | 46.295 | 11.574 | 41503.2422 |
+| 1000 | 0.1616 | 2063.0137 | 9092.1670 | 10577.4082 | 21.6631 | 46.161 | 11.54 | 35115.7461 |
+| 1500 | 0.2424 | 1871.0499 | 7938.2212 | 10358.4639 | 21.603 | 46.29 | 11.572 | 27292.1016 |
+| 2000 | 0.3232 | 1695.2686 | 7227.6602 | 9993.9844 | 21.6259 | 46.241 | 11.56 | 23182.9023 |
+| 2500 | 0.4040 | 1612.1370 | 6837.5381 | 9819.5840 | 21.6071 | 46.281 | 11.57 | 20554.8633 |
+| 3000 | 0.4848 | 1560.7086 | 6503.2227 | 9719.8721 | 21.6469 | 46.196 | 11.549 | 18931.7129 |
+| 3500 | 0.5657 | 1508.6647 | 6356.7939 | 9568.1279 | 21.5545 | 46.394 | 11.599 | 18264.9355 |
+| 4000 | 0.6465 | 1453.8334 | 6368.4561 | 9422.2725 | 21.6702 | 46.146 | 11.537 | 18620.8105 |
+| 4500 | 0.7273 | 1410.3822 | 6362.3979 | 9391.8076 | 21.643 | 46.204 | 11.551 | 19780.6777 |
+| 5000 | 0.8081 | 1377.3173 | 6155.8887 | 9252.1602 | 21.6684 | 46.15 | 11.538 | 19119.7227 |
+| 5500 | 0.8889 | 1357.9893 | 6214.3193 | 9214.1436 | 21.8033 | 45.865 | 11.466 | 18457.4102 |
+| 6000 | 0.9697 | 1323.8629 | 6020.0356 | 9141.9521 | 21.6876 | 46.109 | 11.527 | 17406.4805 |
+| 6187 | 0.9999 | 1317.8882 | 6160.0112 | 9100.8643 | 21.7479 | 45.982 | 11.495 | 18720.5391 |
 ### Framework versions
+- Distily 0.2.0
 - Transformers 4.44.0
 - Pytorch 2.3.0
 - Datasets 2.20.0

runs/Aug10_05-37-44_93d6cbb3ad53/events.out.tfevents.1723272569.93d6cbb3ad53 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2690431ffa6cc4dcc5d737096e31d1998e827abd709d238eeaf8269c9aeb7da7
+size 249