lapp0's picture
End of training
fd47641 verified
|
raw
history blame
3.38 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_linear_objectives
    results: []

distily_bench_gpt2_optim

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 527.0228
  • eval_frwikippl: 3796.0032
  • eval_zhwikippl: 4795.4683
  • eval_loss: 2376.6721
  • eval_runtime: 21.817
  • eval_samples_per_second: 45.836
  • eval_steps_per_second: 11.459

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: LinearObjective(logits_weight=1, logits_loss_fn=<function kl_divergence_loss at 0x7f57c4b07910>, activations_weight=1, activations_loss_fn=<function kl_divergence_loss at 0x7f57c4b07910>, attentions_weight=0, attentions_loss_fn=<function mse_loss at 0x7f57c4b07880>)
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 4.5067 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 55339.3672 57682.5742 31197.1836 21.7082 46.065 11.516 57080.2930
500 0.0808 1509.5735 7497.0439 3194.9919 21.4587 46.601 11.65 50589.3438
1000 0.1616 1083.2607 5620.3037 2923.7439 21.5879 46.322 11.581 29616.2285
1500 0.2424 906.6083 4937.0078 2796.2080 21.6636 46.16 11.54 21403.5996
2000 0.3232 813.4678 4877.3267 2706.0481 21.5303 46.446 11.612 20010.4863
2500 0.4040 750.0352 4512.8765 2636.6079 21.6059 46.284 11.571 16546.3457
3000 0.4848 704.7218 4373.6377 2583.7920 21.6069 46.281 11.57 14758.0859
3500 0.5657 667.2821 4153.7866 2537.5520 21.59 46.318 11.579 14131.2881
4000 0.6465 635.3494 4060.9749 2505.6001 21.554 46.395 11.599 13081.5996
4500 0.7273 605.6495 4037.2766 2468.9121 21.795 45.882 11.471 11453.9658
5000 0.8081 573.4954 3881.2524 2437.7439 21.6801 46.125 11.531 8931.2441
5500 0.8889 557.2740 3918.3730 2413.4880 21.5054 46.5 11.625 6643.0454
6000 0.9697 549.7523 4035.1443 2392.2400 21.6194 46.255 11.564 5330.4404
6187 0.9999 527.0228 3796.0032 2376.6721 21.817 45.836 11.459 4795.4683

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0