lapp0's picture
End of training
b07b9e4 verified
|
raw
history blame
12.4 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.7
    results: []

distily_bench_obj_cross_v2.7

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 203.4302
  • eval_frwikippl: 112507.4219
  • eval_zhwikippl: 1544401.25
  • eval_tinystoriesppl: 10.4258
  • eval_loss: 1.2132
  • eval_runtime: 6.5192
  • eval_samples_per_second: 76.696
  • eval_steps_per_second: 9.664

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.004
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6058 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 23232.2363 111004.0469 6.4068 6.5038 76.879 9.687 9550.5166 102446.0156
500 0.0101 286.9499 206403.0312 1.3987 6.5068 76.843 9.682 11.3159 2106274.75
1000 0.0202 213.8942 135439.8906 1.2477 6.4902 77.039 9.707 10.3927 1507759.375
1500 0.0303 202.3927 107609.4375 1.2191 6.4909 77.031 9.706 10.4513 1709189.875
2000 0.0404 204.6472 114618.8281 1.2146 6.4939 76.996 9.701 10.4206 1537001.75
2500 0.0505 208.5117 122800.5781 1.2144 6.4852 77.099 9.714 10.4021 1744199.25
3000 0.0606 203.1310 112761.25 1.2126 6.5107 76.796 9.676 10.3665 1447065.25
3500 0.0707 205.6802 116737.0 1.2140 6.5089 76.818 9.679 10.3661 1639513.0
4000 0.0808 206.5504 110652.8281 1.2139 6.5215 76.669 9.66 10.4681 1485400.5
4500 0.0909 205.1075 109474.6094 1.2136 6.5078 76.83 9.681 10.4159 1445137.25
5000 0.1010 203.7140 110279.3672 1.2138 6.5456 76.387 9.625 10.4668 1533724.75
5500 0.1111 205.5050 114489.6953 1.2128 6.5317 76.55 9.645 10.4305 1672203.5
6000 0.1212 202.5809 106884.2891 1.2136 6.5309 76.559 9.646 10.4206 1468064.25
6500 0.1313 202.1420 113494.2344 1.2131 6.5183 76.707 9.665 10.3451 1576036.75
7000 0.1414 206.9188 116819.1875 1.2130 6.4962 76.968 9.698 10.4310 1638637.625
7500 0.1515 203.4144 107367.2109 1.2134 6.5755 76.04 9.581 10.4859 1450158.5
8000 0.1616 203.0209 113174.9531 1.2135 6.5095 76.811 9.678 10.3033 1514209.25
8500 0.1717 199.6443 100468.0781 1.2131 6.5168 76.725 9.667 10.4392 1311043.875
9000 0.1818 200.8854 108095.5391 1.2137 6.5151 76.745 9.67 10.2138 1448225.0
9500 0.1919 203.4302 112507.4219 1.2132 6.5192 76.696 9.664 10.4258 1544401.25
10000 0.2020 205.1313 110403.75 1.2134 6.5041 76.875 9.686 10.4660 1595496.875
10500 0.2121 204.6551 114103.3906 1.2130 6.5153 76.742 9.67 10.3361 1507759.375
11000 0.2222 205.3062 115477.6953 1.2134 6.5075 76.834 9.681 10.4582 1510175.625
11500 0.2323 203.7376 113942.7812 1.2127 6.5247 76.632 9.656 10.2722 1610896.125
12000 0.2424 202.9187 112713.6172 1.2131 6.5093 76.814 9.679 10.2812 1613476.0
12500 0.2525 206.7906 114135.5 1.2134 6.4932 77.004 9.702 10.4301 1588700.75
13000 0.2626 204.8058 115885.1172 1.2133 6.5624 76.192 9.6 10.4284 1610896.125
13500 0.2727 204.1484 111434.8906 1.2131 6.5157 76.738 9.669 10.4841 1585313.625
14000 0.2828 205.4413 112888.3203 1.2134 6.5375 76.482 9.637 10.4280 1526377.875
14500 0.2929 205.4494 112856.5703 1.2134 6.5202 76.685 9.662 10.4262 1574355.75
15000 0.3030 205.5369 111938.2734 1.2127 6.5102 76.802 9.677 10.4008 1513402.25
15500 0.3131 203.0129 113366.3672 1.2131 6.5084 76.824 9.68 10.2892 1587854.125
16000 0.3232 205.5687 112824.8203 1.2131 6.5421 76.428 9.63 10.4383 1514209.25
16500 0.3333 204.0773 109289.7656 1.2133 6.506 76.853 9.683 10.3910 1506954.375
17000 0.3434 203.9982 112824.8203 1.2133 6.5214 76.671 9.661 10.4482 1531272.375
17500 0.3535 206.1109 113558.2188 1.2127 6.5276 76.598 9.651 10.3738 1565141.625
18000 0.3636 202.1342 104355.2578 1.2131 6.5179 76.711 9.666 10.5219 1446294.0
18500 0.3737 204.3779 110777.6328 1.2133 6.5025 76.893 9.689 10.4392 1523935.75
19000 0.3838 206.5424 116162.8438 1.2127 6.4979 76.948 9.695 10.3374 1635144.0
19500 0.3939 204.2829 112127.7031 1.2129 6.5297 76.573 9.648 10.4223 1463371.75
20000 0.4040 202.3927 108844.1641 1.2132 6.5051 76.863 9.685 10.4413 1544401.25
20500 0.4141 202.5260 112777.1641 1.2129 6.5005 76.917 9.692 10.2214 1525562.875
21000 0.4242 205.3539 112380.6719 1.2131 6.5062 76.85 9.683 10.4495 1508563.375
21500 0.4343 204.9645 111749.2656 1.2127 6.5116 76.786 9.675 10.4219 1483816.125
22000 0.4444 202.1969 112269.9062 1.2133 6.5089 76.817 9.679 10.2625 1589549.5
22500 0.4545 207.1032 116425.0312 1.2134 6.5782 76.009 9.577 10.4021 1580246.375
23000 0.4646 202.7693 111560.5781 1.2130 6.5092 76.814 9.679 10.3301 1416885.75
23500 0.4747 205.1631 115266.4453 1.2131 6.5057 76.855 9.684 10.3144 1592945.75
24000 0.4848 202.1498 110730.8438 1.2127 6.533 76.534 9.643 10.2926 1512594.25
24500 0.4949 203.0680 111011.8828 1.2129 6.5139 76.759 9.672 10.4000 1447451.75
25000 0.5051 206.4065 112254.0625 1.2134 6.5269 76.606 9.652 10.4297 1569322.125
25500 0.5152 202.6124 107427.6406 1.2126 6.5157 76.738 9.669 10.4262 1486192.625
26000 0.5253 202.0011 105775.9453 1.2127 6.5572 76.253 9.608 10.4060 1452093.125
26500 0.5354 202.9501 108034.6328 1.2127 6.568 76.126 9.592 10.3712 1466498.375
27000 0.5455 202.7301 108095.5391 1.2132 6.4898 77.044 9.708 10.3665 1505347.0
27500 0.5556 205.7360 112507.4219 1.2127 6.5255 76.623 9.654 10.4413 1565141.625
28000 0.5657 202.0011 108034.6328 1.2129 6.5214 76.671 9.661 10.3837 1484607.375
28500 0.5758 203.2096 108706.2969 1.2129 6.523 76.652 9.658 10.4331 1527191.875
29000 0.5859 203.6351 110699.5859 1.2126 6.5222 76.661 9.659 10.4094 1558474.75
29500 0.5960 202.6594 107065.1641 1.2134 6.5256 76.622 9.654 10.3944 1468064.25
30000 0.6061 202.6280 107306.7109 1.2129 6.519 76.699 9.664 10.4103 1475918.5
30500 0.6162 202.7379 108645.0469 1.2128 6.5228 76.654 9.658 10.4185 1483816.125
31000 0.6263 203.3198 109120.5234 1.2133 6.4945 76.988 9.7 10.4301 1528822.5
31500 0.6364 203.8245 111875.3047 1.2126 6.4967 76.962 9.697 10.4008 1555152.125
32000 0.6465 202.9107 107730.7109 1.2131 6.493 77.007 9.703 10.4219 1534544.125
32500 0.6566 201.8682 108339.4062 1.2131 6.4912 77.027 9.705 10.3674 1543578.125
33000 0.6667 203.7298 109922.6797 1.2126 6.4859 77.091 9.713 10.3948 1507759.375
33500 0.6768 203.3198 109582.6172 1.2127 6.4963 76.967 9.698 10.3961 1475918.5
34000 0.6869 204.9724 111309.4609 1.2127 6.4903 77.038 9.707 10.4452 1569322.125
34500 0.6970 203.1310 108828.9062 1.2128 6.4917 77.021 9.705 10.3918 1510175.625
35000 0.7071 202.9815 109197.3516 1.2128 6.5052 76.861 9.685 10.3639 1523935.75
35500 0.7172 203.1467 109135.9297 1.2131 6.4998 76.925 9.693 10.4056 1473558.5
36000 0.7273 203.3513 109151.2266 1.2131 6.5407 76.444 9.632 10.4163 1507759.375
36500 0.7374 202.7301 110808.8047 1.2131 6.511 76.794 9.676 10.3554 1549353.625
37000 0.7475 202.9422 109258.9141 1.2129 6.5252 76.626 9.655 10.3987 1519876.25
37500 0.7576 203.2096 110139.6875 1.2127 6.4909 77.031 9.706 10.4038 1506954.375
38000 0.7677 204.0456 109860.7422 1.2125 6.5271 76.603 9.652 10.4392 1483024.0
38500 0.7778 203.1624 109197.3516 1.2129 6.5222 76.661 9.659 10.3837 1487779.5
39000 0.7879 203.2883 109274.3359 1.2126 6.703 74.593 9.399 10.4038 1519064.75
39500 0.7980 203.1152 109089.8281 1.2129 6.5 76.923 9.692 10.4211 1506151.25
40000 0.8081 203.2411 109551.6875 1.2128 6.5023 76.896 9.689 10.4129 1498136.125
40500 0.8182 203.1467 109767.9531 1.2127 6.5007 76.915 9.691 10.4167 1483816.125
41000 0.8283 202.9107 108767.5859 1.2131 6.5147 76.75 9.671 10.4017 1498935.0
41500 0.8384 202.9736 109151.2266 1.2126 6.5167 76.726 9.667 10.4038 1485400.5
42000 0.8485 203.2883 109644.3984 1.2127 6.5139 76.759 9.672 10.4051 1476707.0
42500 0.8586 202.5966 108767.5859 1.2126 6.5096 76.81 9.678 10.4069 1475918.5
43000 0.8687 204.2354 110668.4453 1.2127 6.5038 76.878 9.687 10.4211 1520686.625
43500 0.8788 203.8402 111403.5469 1.2124 6.541 76.44 9.631 10.3708 1523123.625
44000 0.8889 204.0615 111215.3359 1.2126 6.5392 76.462 9.634 10.3884 1531272.375
44500 0.8990 203.8245 110263.9062 1.2126 6.5434 76.413 9.628 10.3875 1507759.375
45000 0.9091 203.8718 110668.4453 1.2125 6.5546 76.283 9.612 10.3854 1514209.25
45500 0.9192 203.0051 109953.7188 1.2126 6.52 76.687 9.663 10.3824 1494942.0
46000 0.9293 203.7772 110606.0938 1.2126 6.4887 77.058 9.709 10.4064 1504544.75
46500 0.9394 203.8087 110606.0938 1.2127 6.502 76.899 9.689 10.4051 1504544.75
47000 0.9495 203.3671 109953.7188 1.2126 6.5034 76.883 9.687 10.3957 1504544.75
47500 0.9596 203.6351 110357.1172 1.2126 6.4876 77.07 9.711 10.4004 1506954.375
48000 0.9697 203.2254 110512.6719 1.2124 6.5074 76.835 9.681 10.3837 1504544.75
48500 0.9798 203.6509 110606.0938 1.2124 6.504 76.876 9.686 10.4060 1504544.75
49000 0.9899 203.7140 110606.0938 1.2126 6.5071 76.839 9.682 10.4090 1507759.375
49500 1.0 203.7456 110606.0938 1.2126 6.5184 76.706 9.665 10.4090 1507759.375

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0