lapp0 commited on
Commit
c3ea855
·
verified ·
1 Parent(s): 75cf9c9

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 3607.3171
19
- - eval_frwikippl: 29425.125
20
- - eval_zhwikippl: 52510.3125
21
- - eval_tinystoriesppl: 1167.9218
22
- - eval_loss: 5.1093
23
- - eval_runtime: 6.526
24
- - eval_samples_per_second: 76.617
25
- - eval_steps_per_second: 9.654
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -47,8 +47,8 @@ More information needed
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
- - learning_rate: 0.0004
51
- - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
@@ -56,26 +56,112 @@ The following hyperparameters were used during training:
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
- Peak GPU Memory: 8.0568 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
- | 0 | 0 | 21321.3555 | 56774.5312 | 6.6010 | 6.5485 | 76.353 | 9.621 | 11289.9248 | 60744.7383 |
66
- | 500 | 0.0808 | 3754.7207 | 29512.3027 | 5.1110 | 6.493 | 77.007 | 9.703 | 1235.4543 | 53915.7461 |
67
- | 1000 | 0.1616 | 3629.7410 | 29470.7617 | 5.1093 | 6.5015 | 76.906 | 9.69 | 1179.3701 | 52678.6953 |
68
- | 1500 | 0.2424 | 3604.8032 | 29425.125 | 5.1093 | 6.4868 | 77.08 | 9.712 | 1167.5359 | 52510.3125 |
69
- | 2000 | 0.3232 | 3604.8032 | 29425.125 | 5.1093 | 6.4978 | 76.949 | 9.696 | 1167.3427 | 52510.3125 |
70
- | 2500 | 0.4040 | 3607.3171 | 29425.125 | 5.1093 | 6.5048 | 76.866 | 9.685 | 1167.9218 | 52510.3125 |
71
- | 3000 | 0.4848 | 3607.3171 | 29425.125 | 5.1093 | 6.5218 | 76.666 | 9.66 | 1167.9218 | 52510.3125 |
72
- | 3500 | 0.5656 | 3607.3171 | 29425.125 | 5.1093 | 6.5134 | 76.764 | 9.672 | 1167.9218 | 52510.3125 |
73
- | 4000 | 0.6464 | 3607.3171 | 29425.125 | 5.1093 | 6.6885 | 74.755 | 9.419 | 1167.9218 | 52510.3125 |
74
- | 4500 | 0.7272 | 3607.3171 | 29425.125 | 5.1093 | 6.5018 | 76.902 | 9.69 | 1167.9218 | 52510.3125 |
75
- | 5000 | 0.8080 | 3607.3171 | 29425.125 | 5.1093 | 6.5019 | 76.9 | 9.689 | 1167.9218 | 52510.3125 |
76
- | 5500 | 0.8888 | 3607.3171 | 29425.125 | 5.1093 | 6.5496 | 76.34 | 9.619 | 1167.9218 | 52510.3125 |
77
- | 6000 | 0.9696 | 3607.3171 | 29425.125 | 5.1093 | 6.5045 | 76.87 | 9.686 | 1167.9218 | 52510.3125 |
78
- | 6188 | 1.0 | 3607.3171 | 29425.125 | 5.1093 | 6.526 | 76.617 | 9.654 | 1167.9218 | 52510.3125 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ### Framework versions
81
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 184.3409
19
+ - eval_frwikippl: 58809.4336
20
+ - eval_zhwikippl: 498418.7812
21
+ - eval_tinystoriesppl: 10.4219
22
+ - eval_loss: 1.3030
23
+ - eval_runtime: 6.5437
24
+ - eval_samples_per_second: 76.409
25
+ - eval_steps_per_second: 9.628
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
+ - learning_rate: 0.004
51
+ - train_batch_size: 1
52
  - eval_batch_size: 8
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
+ Peak GPU Memory: 6.6047 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
+ | 0 | 0 | 9095.8965 | 52350.1094 | 6.1545 | 6.5255 | 76.623 | 9.654 | 3753.9187 | 61167.5938 |
66
+ | 500 | 0.0101 | 243.8593 | 125008.3047 | 1.4706 | 6.4961 | 76.969 | 9.698 | 11.1718 | 994172.1875 |
67
+ | 1000 | 0.0202 | 201.1267 | 66485.8203 | 1.3342 | 6.5302 | 76.567 | 9.647 | 10.9148 | 584009.625 |
68
+ | 1500 | 0.0303 | 185.1781 | 63189.5391 | 1.3091 | 6.4906 | 77.035 | 9.706 | 10.4357 | 522659.3125 |
69
+ | 2000 | 0.0404 | 184.7626 | 62235.5703 | 1.3033 | 6.5025 | 76.893 | 9.689 | 10.3532 | 500283.6562 |
70
+ | 2500 | 0.0505 | 190.2160 | 64720.9258 | 1.3022 | 6.502 | 76.9 | 9.689 | 10.4971 | 539807.5625 |
71
+ | 3000 | 0.0606 | 187.8947 | 62657.7461 | 1.3021 | 6.5475 | 76.365 | 9.622 | 10.4232 | 546764.75 |
72
+ | 3500 | 0.0707 | 187.1394 | 64284.8477 | 1.3019 | 6.5077 | 76.832 | 9.681 | 10.3626 | 568179.5 |
73
+ | 4000 | 0.0808 | 187.8584 | 63189.5391 | 1.3016 | 6.5082 | 76.826 | 9.68 | 10.5624 | 544435.75 |
74
+ | 4500 | 0.0909 | 185.7313 | 64230.4922 | 1.3024 | 6.5949 | 75.816 | 9.553 | 10.3738 | 557964.625 |
75
+ | 5000 | 0.1010 | 188.6459 | 64407.2109 | 1.3027 | 6.5061 | 76.851 | 9.683 | 10.5568 | 520155.6562 |
76
+ | 5500 | 0.1111 | 184.7412 | 58677.0 | 1.3028 | 6.5056 | 76.857 | 9.684 | 10.5502 | 500017.0312 |
77
+ | 6000 | 0.1212 | 187.4877 | 65491.2656 | 1.3021 | 6.4967 | 76.962 | 9.697 | 10.3348 | 554995.375 |
78
+ | 6500 | 0.1313 | 186.3582 | 64266.7031 | 1.3021 | 6.5693 | 76.112 | 9.59 | 10.3983 | 553221.4375 |
79
+ | 7000 | 0.1414 | 187.1901 | 63242.9531 | 1.3018 | 6.544 | 76.406 | 9.627 | 10.5450 | 506461.375 |
80
+ | 7500 | 0.1515 | 186.2933 | 59626.8398 | 1.3022 | 6.5016 | 76.904 | 9.69 | 10.5895 | 530102.25 |
81
+ | 8000 | 0.1616 | 186.3437 | 63546.5742 | 1.3018 | 6.5127 | 76.773 | 9.673 | 10.3340 | 530385.4375 |
82
+ | 8500 | 0.1717 | 185.1208 | 60319.5742 | 1.3025 | 6.4954 | 76.977 | 9.699 | 10.4469 | 510258.875 |
83
+ | 9000 | 0.1818 | 181.6476 | 60549.3633 | 1.3035 | 6.5012 | 76.909 | 9.691 | 10.2438 | 524055.8125 |
84
+ | 9500 | 0.1919 | 184.3409 | 58809.4336 | 1.3030 | 6.5437 | 76.409 | 9.628 | 10.4219 | 498418.7812 |
85
+ | 10000 | 0.2020 | 186.4449 | 62923.0820 | 1.3019 | 6.5079 | 76.83 | 9.681 | 10.4482 | 542695.5625 |
86
+ | 10500 | 0.2121 | 188.1788 | 63318.7539 | 1.3030 | 6.4961 | 76.969 | 9.698 | 10.5502 | 541827.25 |
87
+ | 11000 | 0.2222 | 187.6693 | 64289.3242 | 1.3024 | 6.5016 | 76.905 | 9.69 | 10.4733 | 553221.4375 |
88
+ | 11500 | 0.2323 | 185.5802 | 64248.625 | 1.3021 | 6.5073 | 76.837 | 9.681 | 10.2012 | 533080.6875 |
89
+ | 12000 | 0.2424 | 189.5834 | 63672.0234 | 1.3022 | 6.5265 | 76.611 | 9.653 | 10.5284 | 528407.875 |
90
+ | 12500 | 0.2525 | 180.2249 | 59601.6523 | 1.3024 | 6.5297 | 76.573 | 9.648 | 10.1570 | 495766.4062 |
91
+ | 13000 | 0.2626 | 182.8547 | 61321.8477 | 1.3034 | 6.5338 | 76.525 | 9.642 | 10.2544 | 525455.5625 |
92
+ | 13500 | 0.2727 | 187.0524 | 64284.8477 | 1.3028 | 6.4889 | 77.054 | 9.709 | 10.3267 | 553516.4375 |
93
+ | 14000 | 0.2828 | 185.0635 | 62042.9727 | 1.3031 | 6.4841 | 77.112 | 9.716 | 10.4310 | 506461.375 |
94
+ | 14500 | 0.2929 | 186.0697 | 62437.5156 | 1.3017 | 6.5099 | 76.806 | 9.678 | 10.3867 | 528267.3125 |
95
+ | 15000 | 0.3030 | 184.3052 | 59929.9531 | 1.3035 | 6.4988 | 76.937 | 9.694 | 10.4081 | 489719.125 |
96
+ | 15500 | 0.3131 | 186.6616 | 62604.8242 | 1.3016 | 6.4906 | 77.035 | 9.706 | 10.4435 | 530668.25 |
97
+ | 16000 | 0.3232 | 189.1726 | 65017.875 | 1.3019 | 6.4903 | 77.038 | 9.707 | 10.4482 | 533792.375 |
98
+ | 16500 | 0.3333 | 187.0235 | 61572.8906 | 1.3026 | 6.5061 | 76.851 | 9.683 | 10.3716 | 514908.6875 |
99
+ | 17000 | 0.3434 | 187.6693 | 63296.4727 | 1.3018 | 6.5151 | 76.745 | 9.67 | 10.4456 | 564854.625 |
100
+ | 17500 | 0.3535 | 186.0697 | 63887.6484 | 1.3026 | 6.5001 | 76.922 | 9.692 | 10.4984 | 529254.625 |
101
+ | 18000 | 0.3636 | 185.4366 | 61538.1992 | 1.3022 | 6.5727 | 76.072 | 9.585 | 10.4500 | 524475.3125 |
102
+ | 18500 | 0.3737 | 184.0769 | 60874.3516 | 1.3021 | 6.5196 | 76.692 | 9.663 | 10.2820 | 501887.875 |
103
+ | 19000 | 0.3838 | 187.3134 | 62834.5117 | 1.3021 | 6.5518 | 76.315 | 9.616 | 10.5154 | 515458.4688 |
104
+ | 19500 | 0.3939 | 187.0235 | 61851.0312 | 1.3027 | 6.5237 | 76.643 | 9.657 | 10.4323 | 531234.875 |
105
+ | 20000 | 0.4040 | 185.3360 | 59694.0898 | 1.3029 | 6.4861 | 77.088 | 9.713 | 10.4937 | 489719.125 |
106
+ | 20500 | 0.4141 | 185.0204 | 61746.5977 | 1.3032 | 6.4963 | 76.967 | 9.698 | 10.3011 | 531802.625 |
107
+ | 21000 | 0.4242 | 186.9511 | 65935.5312 | 1.3018 | 6.5055 | 76.858 | 9.684 | 10.3541 | 560351.5 |
108
+ | 21500 | 0.4343 | 184.4909 | 61642.2812 | 1.3018 | 6.4899 | 77.042 | 9.707 | 10.4206 | 499483.7188 |
109
+ | 22000 | 0.4444 | 186.8424 | 60728.7539 | 1.3014 | 6.4896 | 77.046 | 9.708 | 10.5957 | 501085.375 |
110
+ | 22500 | 0.4545 | 186.3437 | 60711.6719 | 1.3017 | 6.6171 | 75.561 | 9.521 | 10.4323 | 501620.375 |
111
+ | 23000 | 0.4646 | 184.1268 | 61494.8438 | 1.3016 | 6.5961 | 75.803 | 9.551 | 10.3293 | 515458.4688 |
112
+ | 23500 | 0.4747 | 187.1104 | 64284.8477 | 1.3023 | 6.4979 | 76.948 | 9.696 | 10.4508 | 528690.125 |
113
+ | 24000 | 0.4848 | 185.9689 | 62332.0352 | 1.3023 | 6.4867 | 77.08 | 9.712 | 10.3674 | 536362.1875 |
114
+ | 24500 | 0.4949 | 186.0769 | 62402.3359 | 1.3023 | 6.5117 | 76.785 | 9.675 | 10.3545 | 517940.0 |
115
+ | 25000 | 0.5051 | 185.3647 | 62587.2148 | 1.3016 | 6.4943 | 76.99 | 9.701 | 10.3721 | 528831.3125 |
116
+ | 25500 | 0.5152 | 186.1995 | 63011.7734 | 1.3026 | 6.515 | 76.746 | 9.67 | 10.4357 | 506461.375 |
117
+ | 26000 | 0.5253 | 187.0524 | 62799.1641 | 1.3022 | 6.5338 | 76.526 | 9.642 | 10.4137 | 506731.9375 |
118
+ | 26500 | 0.5354 | 186.1995 | 62657.7461 | 1.3022 | 6.5427 | 76.421 | 9.629 | 10.3879 | 500817.8125 |
119
+ | 27000 | 0.5455 | 186.0697 | 61999.3203 | 1.3021 | 6.4978 | 76.949 | 9.696 | 10.4167 | 514359.4688 |
120
+ | 27500 | 0.5556 | 187.0235 | 63296.4727 | 1.3026 | 6.4941 | 76.993 | 9.701 | 10.4297 | 520155.6562 |
121
+ | 28000 | 0.5657 | 187.0307 | 62112.9492 | 1.3019 | 6.491 | 77.03 | 9.706 | 10.4850 | 505651.5625 |
122
+ | 28500 | 0.5758 | 185.9976 | 62976.2695 | 1.3018 | 6.4918 | 77.021 | 9.705 | 10.3618 | 507814.625 |
123
+ | 29000 | 0.5859 | 187.1829 | 64575.2383 | 1.3024 | 6.4998 | 76.926 | 9.693 | 10.4586 | 538369.0625 |
124
+ | 29500 | 0.5960 | 186.5315 | 64176.2422 | 1.3023 | 6.5299 | 76.571 | 9.648 | 10.3652 | 549397.0 |
125
+ | 30000 | 0.6061 | 187.8438 | 62463.9023 | 1.3026 | 6.528 | 76.594 | 9.651 | 10.5163 | 528972.0625 |
126
+ | 30500 | 0.6162 | 186.2427 | 62025.5195 | 1.3021 | 6.5132 | 76.768 | 9.673 | 10.3738 | 520433.0 |
127
+ | 31000 | 0.6263 | 187.3425 | 63510.7656 | 1.3027 | 6.4955 | 76.976 | 9.699 | 10.4129 | 526297.625 |
128
+ | 31500 | 0.6364 | 186.8569 | 61964.3867 | 1.3025 | 6.4926 | 77.011 | 9.703 | 10.4850 | 507814.625 |
129
+ | 32000 | 0.6465 | 187.5240 | 63332.0977 | 1.3023 | 6.4876 | 77.07 | 9.711 | 10.4742 | 522101.8438 |
130
+ | 32500 | 0.6566 | 186.2861 | 63367.8047 | 1.3026 | 6.4849 | 77.102 | 9.715 | 10.3554 | 519323.4375 |
131
+ | 33000 | 0.6667 | 186.8931 | 63689.9375 | 1.3017 | 6.4831 | 77.124 | 9.718 | 10.3918 | 512715.4062 |
132
+ | 33500 | 0.6768 | 186.5604 | 62481.4766 | 1.3019 | 6.4927 | 77.009 | 9.703 | 10.4603 | 514084.8438 |
133
+ | 34000 | 0.6869 | 184.0769 | 61546.8828 | 1.3023 | 6.4923 | 77.015 | 9.704 | 10.2982 | 496560.4062 |
134
+ | 34500 | 0.6970 | 186.6327 | 63082.8438 | 1.3019 | 6.5074 | 76.835 | 9.681 | 10.4202 | 525455.5625 |
135
+ | 35000 | 0.7071 | 186.0409 | 62640.1211 | 1.3019 | 6.5011 | 76.91 | 9.691 | 10.3639 | 514084.8438 |
136
+ | 35500 | 0.7172 | 187.6039 | 63707.9219 | 1.3019 | 6.4981 | 76.946 | 9.695 | 10.4348 | 522380.75 |
137
+ | 36000 | 0.7273 | 186.3293 | 61503.5234 | 1.3017 | 6.5057 | 76.855 | 9.684 | 10.4405 | 502155.9688 |
138
+ | 36500 | 0.7374 | 185.9689 | 62104.1836 | 1.3017 | 6.5178 | 76.713 | 9.666 | 10.4500 | 502960.2188 |
139
+ | 37000 | 0.7475 | 187.1611 | 63672.0234 | 1.3019 | 6.5124 | 76.776 | 9.674 | 10.4625 | 526578.75 |
140
+ | 37500 | 0.7576 | 187.6329 | 64212.4258 | 1.3021 | 6.5257 | 76.62 | 9.654 | 10.4219 | 535504.0625 |
141
+ | 38000 | 0.7677 | 186.3005 | 63367.8047 | 1.3022 | 6.5192 | 76.696 | 9.664 | 10.3669 | 527703.875 |
142
+ | 38500 | 0.7778 | 186.4738 | 63618.2461 | 1.3023 | 6.5003 | 76.919 | 9.692 | 10.3712 | 526016.625 |
143
+ | 39000 | 0.7879 | 187.4296 | 63528.6992 | 1.3019 | 6.5045 | 76.87 | 9.686 | 10.4172 | 528267.3125 |
144
+ | 39500 | 0.7980 | 186.7194 | 63403.5312 | 1.3022 | 6.4969 | 76.96 | 9.697 | 10.3845 | 524895.125 |
145
+ | 40000 | 0.8081 | 187.0089 | 62905.3789 | 1.3020 | 6.5261 | 76.616 | 9.654 | 10.4500 | 521267.0 |
146
+ | 40500 | 0.8182 | 186.1706 | 62384.7812 | 1.3023 | 6.5159 | 76.735 | 9.669 | 10.3442 | 511895.0938 |
147
+ | 41000 | 0.8283 | 186.3582 | 62728.4141 | 1.3019 | 6.5084 | 76.824 | 9.68 | 10.4120 | 516835.75 |
148
+ | 41500 | 0.8384 | 187.4441 | 63995.7031 | 1.3019 | 6.5048 | 76.866 | 9.685 | 10.4211 | 529396.0 |
149
+ | 42000 | 0.8485 | 187.6329 | 64104.0039 | 1.3021 | 6.4997 | 76.926 | 9.693 | 10.4370 | 530102.25 |
150
+ | 42500 | 0.8586 | 186.6761 | 63296.4727 | 1.3021 | 6.5063 | 76.849 | 9.683 | 10.3721 | 520988.6875 |
151
+ | 43000 | 0.8687 | 187.4877 | 63582.3984 | 1.3020 | 6.4889 | 77.055 | 9.709 | 10.4103 | 523217.375 |
152
+ | 43500 | 0.8788 | 186.7484 | 63047.3008 | 1.3021 | 6.489 | 77.054 | 9.709 | 10.4034 | 521267.0 |
153
+ | 44000 | 0.8889 | 187.2409 | 62940.8438 | 1.3019 | 6.4957 | 76.974 | 9.699 | 10.4370 | 520711.0312 |
154
+ | 44500 | 0.8990 | 186.4449 | 62905.3789 | 1.3021 | 6.5115 | 76.787 | 9.675 | 10.3978 | 521544.9688 |
155
+ | 45000 | 0.9091 | 186.5315 | 63047.3008 | 1.3019 | 6.5135 | 76.764 | 9.672 | 10.4008 | 520711.0312 |
156
+ | 45500 | 0.9192 | 186.4449 | 62869.9336 | 1.3019 | 6.5176 | 76.715 | 9.666 | 10.4013 | 514633.75 |
157
+ | 46000 | 0.9293 | 186.5460 | 62799.1641 | 1.3019 | 6.505 | 76.864 | 9.685 | 10.4120 | 513810.875 |
158
+ | 46500 | 0.9394 | 186.4449 | 62551.9492 | 1.3020 | 6.5435 | 76.412 | 9.628 | 10.3922 | 513810.875 |
159
+ | 47000 | 0.9495 | 186.5315 | 62587.2148 | 1.3020 | 6.5419 | 76.431 | 9.63 | 10.4094 | 515458.4688 |
160
+ | 47500 | 0.9596 | 186.5460 | 62587.2148 | 1.3019 | 6.5054 | 76.859 | 9.684 | 10.4094 | 516008.8438 |
161
+ | 48000 | 0.9697 | 186.8352 | 62763.7812 | 1.3019 | 6.5119 | 76.783 | 9.675 | 10.4262 | 517111.3438 |
162
+ | 48500 | 0.9798 | 186.7339 | 62728.4141 | 1.3019 | 6.5272 | 76.603 | 9.652 | 10.4155 | 516008.8438 |
163
+ | 49000 | 0.9899 | 186.7918 | 62728.4141 | 1.3019 | 6.5054 | 76.86 | 9.684 | 10.4172 | 515733.8125 |
164
+ | 49500 | 1.0 | 186.7918 | 62728.4141 | 1.3019 | 6.5072 | 76.838 | 9.682 | 10.4215 | 515733.8125 |
165
 
166
  ### Framework versions
167
  - Distily 0.2.0
logs/dropout=0, learning_rate=0.004, per_device_train_batch_size=1, weight_decay=0.001/events.out.tfevents.1723906923.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cf00ae38431dc17de9bdf10c5b94497dd15b92b6626d0caee0314f373893ff3
3
+ size 312