lapp0's picture
End of training
430ed51 verified
|
raw
history blame
4.26 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.10_gpt2
    results: []

distily_bench_obj_cross_v2.10_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 452.9807
  • eval_frwikippl: 741.6703
  • eval_zhwikippl: 169.7969
  • eval_tinystoriesppl: 694.5760
  • eval_loss: 1.2502
  • eval_runtime: 21.1964
  • eval_samples_per_second: 47.178
  • eval_steps_per_second: 11.794

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 3.9285 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 270.2348 76.8142 671.1238 22.8030
0 0 120078.375 1867851235328.0 18.7920 21.1643 47.249 11.812 72.8770 4013754155008.0
5000 0.0505 399.8896 1364.9200 1.5750 21.223 47.119 11.78 430.3431 486.9932
10000 0.1010 366.0540 968.1008 1.4975 21.2542 47.05 11.762 410.0440 300.9413
15000 0.1515 382.6534 990.8644 1.4377 21.1883 47.196 11.799 455.3961 243.5069
20000 0.2020 372.0864 985.5745 1.4590 21.2537 47.051 11.763 430.2186 317.8063
25000 0.2525 459.8662 802.9102 1.3109 21.2174 47.131 11.783 674.2657 183.8540
30000 0.3030 452.4371 822.7448 1.2777 21.2291 47.105 11.776 674.3492 162.7067
35000 0.3535 476.7241 805.2602 1.2741 21.2169 47.132 11.783 736.0758 174.6150
40000 0.4040 453.2438 770.2305 1.2733 21.1947 47.181 11.795 679.9471 163.0870
45000 0.4545 460.5169 781.4591 1.2687 21.2116 47.144 11.786 700.2546 183.2052
50000 0.5051 479.0564 794.0530 1.2632 21.229 47.105 11.776 743.4755 181.4419
55000 0.5556 471.3993 748.4656 1.2630 21.215 47.137 11.784 731.375 172.6117
60000 0.6061 446.4142 775.7834 1.2687 21.1528 47.275 11.819 669.1851 164.7928
65000 0.6566 455.8672 744.0773 1.2538 21.2207 47.124 11.781 698.6068 164.4469
70000 0.7071 453.5074 740.2094 1.2513 21.3501 46.838 11.71 697.8277 168.6457
75000 0.7576 450.4874 723.2042 1.2535 21.2028 47.164 11.791 685.8463 167.9272
80000 0.8081 455.6377 745.9662 1.2523 21.2178 47.13 11.783 701.7324 170.4892
85000 0.8586 447.3922 746.4918 1.2509 21.2165 47.133 11.783 681.8325 168.7976
90000 0.9091 453.0859 740.9397 1.2505 21.1987 47.173 11.793 696.0992 169.7290
95000 0.9596 451.3083 741.0439 1.2504 21.5668 46.368 11.592 690.2544 169.7969
99000 1.0 452.9807 741.6703 1.2502 21.1964 47.178 11.794 694.5760 169.7969

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0