lapp0's picture
End of training
b27a60b verified
|
raw
history blame
4.28 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.10_gpt2
    results: []

distily_bench_obj_cross_v2.10_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 475.6911
  • eval_frwikippl: 1883.2976
  • eval_zhwikippl: 753.3747
  • eval_tinystoriesppl: 508.0900
  • eval_loss: 1.6984
  • eval_runtime: 21.181
  • eval_samples_per_second: 47.212
  • eval_steps_per_second: 11.803

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-06
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 3.9285 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 270.2348 76.8142 671.1238 22.8030
0 0 120078.375 1867851235328.0 18.7920 21.4263 46.672 11.668 72.8770 4013754155008.0
5000 0.0505 463.2989 2220.3191 1.7597 21.4296 46.664 11.666 465.7518 1000.1456
10000 0.1010 482.0157 1842.5612 1.7109 21.3947 46.741 11.685 518.0355 839.0040
15000 0.1515 474.3848 1847.8888 1.7034 21.3589 46.819 11.705 506.2872 714.8962
20000 0.2020 475.7832 1838.8660 1.7042 21.391 46.749 11.687 508.0481 751.2170
25000 0.2525 494.6881 1854.0813 1.7025 21.5144 46.481 11.62 541.3933 699.7526
30000 0.3030 470.2776 1916.4137 1.7054 21.5321 46.442 11.611 500.7306 709.5754
35000 0.3535 489.6271 1885.3552 1.7014 21.4509 46.618 11.655 532.2521 748.5163
40000 0.4040 516.1402 1929.0752 1.7038 21.2233 47.118 11.78 576.5936 754.3302
45000 0.4545 456.2205 1928.9390 1.7070 21.1932 47.185 11.796 478.0193 808.4593
50000 0.5051 505.5353 1946.6123 1.7023 21.2286 47.106 11.777 557.7241 780.4835
55000 0.5556 472.1027 1931.3862 1.7005 21.2292 47.105 11.776 503.1787 760.6457
60000 0.6061 476.9457 1870.4763 1.6989 21.1827 47.208 11.802 511.0809 754.1288
65000 0.6566 468.1150 1897.7445 1.6984 21.2396 47.082 11.77 499.0363 760.3411
70000 0.7071 483.1840 1883.0992 1.6974 21.1979 47.174 11.794 521.3865 761.0517
75000 0.7576 477.2413 1875.6208 1.6995 21.1986 47.173 11.793 511.2922 761.1025
80000 0.8081 478.3332 1883.3641 1.6985 21.3287 46.885 11.721 513.3676 751.1171
85000 0.8586 475.8018 1880.9122 1.6979 21.1634 47.251 11.813 508.5522 750.0656
90000 0.9091 475.8753 1881.9717 1.6980 21.2695 47.016 11.754 508.5944 751.9692
95000 0.9596 475.6727 1882.6350 1.6983 21.2248 47.115 11.779 507.8802 753.2742
99000 1.0 475.6911 1883.2976 1.6984 21.181 47.212 11.803 508.0900 753.3747

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0