mt5-base-b8-e16-t58k-jupyter

This model is a fine-tuned version of google/mt5-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2383
  • Rouge1: 67.8581
  • Rouge2: 59.4302
  • Rougel: 67.7221
  • Rougelsum: 67.6929
  • Gen Len: 19.1351

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 16

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.8846 0.1379 1000 0.4117 58.1956 47.2104 57.8222 57.8168 19.0252
0.506 0.2759 2000 0.3460 59.4398 49.0483 59.2127 59.2088 18.9282
0.4479 0.4138 3000 0.3214 64.1438 54.2258 63.8836 63.8605 19.1916
0.3963 0.5517 4000 0.2981 65.6803 56.0098 65.4391 65.4087 19.1486
0.3771 0.6897 5000 0.2828 67.1769 57.8865 66.9535 66.9212 19.1732
0.3478 0.8276 6000 0.2723 67.2249 58.1005 67.0126 66.9823 19.1721
0.3332 0.9655 7000 0.2563 67.6065 58.7448 67.4238 67.3956 19.1829
0.2553 1.1034 8000 0.2752 66.6302 57.9725 66.4682 66.4449 19.1119
0.2396 1.2414 9000 0.2604 67.9608 59.219 67.7935 67.76 19.1086
0.2369 1.3793 10000 0.2555 68.1951 59.7011 68.0417 68.0072 19.1754
0.2294 1.5172 11000 0.2508 68.0577 59.5206 67.8871 67.8552 19.1344
0.2265 1.6552 12000 0.2424 67.4119 58.7834 67.1999 67.1837 19.1054
0.2228 1.7931 13000 0.2383 67.8581 59.4302 67.7221 67.6929 19.1351
0.2216 1.9310 14000 0.2391 69.1897 61.0788 69.0352 69.0068 19.1914
0.1769 2.0690 15000 0.2494 69.458 61.4372 69.3297 69.3115 19.1916
0.153 2.2069 16000 0.2477 69.357 61.455 69.2285 69.1898 19.1746
0.1512 2.3448 17000 0.2619 68.6959 60.5759 68.5428 68.5083 19.1279
0.1522 2.4828 18000 0.2530 69.2438 61.3816 69.1173 69.0815 19.1781
0.1563 2.6207 19000 0.2466 68.6796 60.6531 68.5412 68.4936 19.1156
0.1507 2.7586 20000 0.2508 69.8035 61.9634 69.683 69.6408 19.1696
0.1533 2.8966 21000 0.2517 70.4486 62.8697 70.3275 70.2894 19.1685
0.1369 3.0345 22000 0.2619 70.0921 62.4997 69.979 69.9386 19.1621
0.1007 3.1724 23000 0.2644 69.8367 62.0731 69.7019 69.6681 19.1614
0.104 3.3103 24000 0.2652 69.9394 62.3593 69.8122 69.7678 19.1571
0.1068 3.4483 25000 0.2655 70.1364 62.553 70.0215 69.9713 19.1851
0.1106 3.5862 26000 0.2591 69.9154 62.1538 69.7887 69.7509 19.1754
0.1085 3.7241 27000 0.2687 70.1055 62.5805 69.9928 69.9442 19.1585
0.1099 3.8621 28000 0.2568 69.7394 61.9854 69.6332 69.5986 19.1552
0.1087 4.0 29000 0.2615 70.1983 62.6796 70.0887 70.058 19.1558
0.0686 4.1379 30000 0.2742 69.0479 61.407 68.9306 68.9027 19.1205
0.0746 4.2759 31000 0.2689 70.4406 62.9053 70.3163 70.2855 19.1558
0.0758 4.4138 32000 0.2777 70.3994 62.855 70.287 70.2398 19.163
0.0772 4.5517 33000 0.2726 70.2671 62.757 70.1461 70.1029 19.1535
0.0788 4.6897 34000 0.2770 69.5882 61.8849 69.4835 69.4429 19.1656
0.0797 4.8276 35000 0.2764 69.9145 62.404 69.789 69.7598 19.1639
0.079 4.9655 36000 0.2747 69.4204 61.8999 69.3158 69.2775 19.1845
0.0587 5.1034 37000 0.2895 70.3877 63.0522 70.2784 70.235 19.1519
0.0527 5.2414 38000 0.2862 69.4405 61.8645 69.3228 69.28 19.1638
0.0533 5.3793 39000 0.2888 69.3483 61.8568 69.2428 69.2095 19.1431
0.0545 5.5172 40000 0.2946 68.2516 60.621 68.1208 68.0861 19.1365
0.0572 5.6552 41000 0.3024 69.4355 61.9683 69.3316 69.2974 19.1386
0.0592 5.7931 42000 0.2909 69.2934 61.6953 69.181 69.1549 19.158
0.06 5.9310 43000 0.2912 68.983 61.4127 68.8525 68.8369 19.1446
0.0473 6.0690 44000 0.3238 70.1649 62.7399 70.0521 70.0103 19.1725
0.0377 6.2069 45000 0.2999 67.2192 59.4114 67.0974 67.0716 19.1046
0.0407 6.3448 46000 0.3181 67.2047 59.6483 67.0705 67.046 19.2064
0.0404 6.4828 47000 0.3069 67.5288 60.02 67.424 67.3754 19.1977
0.0431 6.6207 48000 0.3144 69.8076 62.3682 69.6792 69.6511 19.1865
0.0427 6.7586 49000 0.3154 69.2514 61.7694 69.1338 69.098 19.2413
0.0438 6.8966 50000 0.3042 69.8743 62.4372 69.7374 69.6987 19.1699
0.0395 7.0345 51000 0.3258 70.7026 63.616 70.6003 70.5601 19.184
0.0284 7.1724 52000 0.3281 70.2281 62.9315 70.1089 70.0755 19.1556
0.0295 7.3103 53000 0.3351 70.5253 63.3491 70.4193 70.3758 19.1659
0.0296 7.4483 54000 0.3382 70.799 63.6006 70.6772 70.6426 19.1755
0.0309 7.5862 55000 0.3292 69.4402 62.1832 69.3146 69.2855 19.18
0.0309 7.7241 56000 0.3358 70.628 63.4921 70.5047 70.4596 19.1676
0.0301 7.8621 57000 0.3433 70.4413 63.2207 70.3199 70.2882 19.1719
0.032 8.0 58000 0.3228 70.5812 63.4071 70.4684 70.4319 19.1773
0.0209 8.1379 59000 0.3496 70.4711 63.1854 70.3596 70.3184 19.1645
0.0201 8.2759 60000 0.3559 70.454 63.2365 70.3396 70.3065 19.18
0.0216 8.4138 61000 0.3452 70.7055 63.551 70.5961 70.5624 19.1603
0.0224 8.5517 62000 0.3425 70.6933 63.5627 70.5959 70.5621 19.1645
0.0228 8.6897 63000 0.3489 70.329 63.2136 70.2368 70.1908 19.1737
0.0231 8.8276 64000 0.3502 70.5163 63.3691 70.4183 70.3802 19.1631
0.0226 8.9655 65000 0.3495 70.3915 63.1873 70.2895 70.2553 19.1594
0.0177 9.1034 66000 0.3665 70.3584 63.2121 70.264 70.2218 19.1728
0.0155 9.2414 67000 0.3668 70.4038 63.2343 70.3021 70.2646 19.1545
0.0156 9.3793 68000 0.3858 70.5827 63.3876 70.4804 70.4326 19.1588
0.0164 9.5172 69000 0.3644 70.9777 63.9787 70.8733 70.8334 19.1545
0.0169 9.6552 70000 0.3693 70.7079 63.6767 70.6034 70.5633 19.1623
0.0169 9.7931 71000 0.3734 70.5823 63.4672 70.4804 70.4471 19.1618
0.0179 9.9310 72000 0.3649 70.6698 63.6145 70.5783 70.5333 19.157
0.0135 10.0690 73000 0.3920 70.6779 63.708 70.5869 70.5515 19.1776
0.011 10.2069 74000 0.3957 70.698 63.5898 70.5937 70.5593 19.1579
0.0112 10.3448 75000 0.3880 70.7058 63.6789 70.6075 70.5706 19.169
0.0124 10.4828 76000 0.3868 70.8377 63.8888 70.7722 70.7271 19.1614
0.0117 10.6207 77000 0.3835 70.225 63.1845 70.1412 70.103 19.1719
0.0118 10.7586 78000 0.3999 70.2805 63.1728 70.1672 70.1469 19.1835
0.0118 10.8966 79000 0.3967 69.99 62.9694 69.9026 69.8748 19.1981
0.0104 11.0345 80000 0.4053 70.2508 63.3003 70.1657 70.1329 19.2093
0.0084 11.1724 81000 0.4122 70.0517 63.0268 69.9457 69.9151 19.1805
0.0086 11.3103 82000 0.4025 70.0812 63.1975 69.9947 69.9587 19.1872
0.0082 11.4483 83000 0.4236 70.7138 63.7789 70.6047 70.5832 19.186
0.0082 11.5862 84000 0.4277 70.336 63.4501 70.2562 70.2147 19.177
0.0081 11.7241 85000 0.4084 69.918 62.8952 69.827 69.7926 19.1648
0.0085 11.8621 86000 0.4193 70.4653 63.5381 70.3767 70.3355 19.1803
0.0078 12.0 87000 0.4321 69.7744 62.7571 69.6712 69.6459 19.1737
0.0054 12.1379 88000 0.4411 69.9454 62.9937 69.8503 69.8164 19.1717
0.0056 12.2759 89000 0.4416 70.563 63.6044 70.4674 70.4182 19.1784
0.0058 12.4138 90000 0.4375 69.9383 62.9516 69.8393 69.8138 19.191
0.0057 12.5517 91000 0.4402 69.9012 62.9782 69.7933 69.7575 19.1724
0.005 12.6897 92000 0.4438 69.4525 62.4818 69.357 69.3203 19.1685
0.0058 12.8276 93000 0.4474 70.2012 63.2868 70.1103 70.0678 19.1769
0.0051 12.9655 94000 0.4434 69.8342 62.9167 69.7348 69.704 19.178
0.0045 13.1034 95000 0.4649 69.9998 63.0079 69.9039 69.8638 19.1893
0.0038 13.2414 96000 0.4640 70.2866 63.3656 70.1951 70.1636 19.1724
0.0039 13.3793 97000 0.4702 70.3794 63.5196 70.2901 70.2546 19.1814
0.0042 13.5172 98000 0.4718 70.6757 63.8287 70.5752 70.5308 19.1701
0.0039 13.6552 99000 0.4816 70.4003 63.5464 70.3017 70.267 19.1517
0.0037 13.7931 100000 0.4718 70.3157 63.4538 70.2295 70.1942 19.1746
0.0039 13.9310 101000 0.4643 70.5475 63.6863 70.4635 70.4256 19.1876
0.0031 14.0690 102000 0.4908 70.3373 63.4716 70.2509 70.2061 19.174
0.0028 14.2069 103000 0.4915 70.5516 63.6897 70.4538 70.4149 19.1822
0.0027 14.3448 104000 0.5110 70.5811 63.7133 70.4927 70.4498 19.1751
0.0025 14.4828 105000 0.4974 70.3384 63.4383 70.2609 70.2126 19.1769
0.0026 14.6207 106000 0.5010 70.7552 63.91 70.6802 70.6254 19.1729
0.0029 14.7586 107000 0.4989 70.78 63.9408 70.6996 70.6527 19.1654
0.0023 14.8966 108000 0.5118 70.8186 64.0192 70.7348 70.691 19.1702
0.0028 15.0345 109000 0.5058 70.8076 63.9936 70.7282 70.6829 19.1612
0.0021 15.1724 110000 0.5094 70.5992 63.7622 70.5161 70.4735 19.1668
0.002 15.3103 111000 0.5148 70.6299 63.8373 70.5413 70.4962 19.1736
0.002 15.4483 112000 0.5197 70.6815 63.9035 70.5942 70.5562 19.1678
0.0018 15.5862 113000 0.5218 70.6869 63.9007 70.606 70.5618 19.1734
0.0019 15.7241 114000 0.5232 70.6718 63.8988 70.5768 70.5393 19.1671
0.0019 15.8621 115000 0.5242 70.6659 63.9002 70.5822 70.5401 19.1739
0.0021 16.0 116000 0.5243 70.6792 63.9083 70.5913 70.5479 19.1747

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.1
Downloads last month
3
Safetensors
Model size
582M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for fresst/mt5-base-b8-e16-t58k-jupyter

Base model

google/mt5-base
Finetuned
(181)
this model