mt5-base-b8-e16-t58k-jupyter
This model is a fine-tuned version of google/mt5-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.2383
- Rouge1: 67.8581
- Rouge2: 59.4302
- Rougel: 67.7221
- Rougelsum: 67.6929
- Gen Len: 19.1351
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 16
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.8846 | 0.1379 | 1000 | 0.4117 | 58.1956 | 47.2104 | 57.8222 | 57.8168 | 19.0252 |
0.506 | 0.2759 | 2000 | 0.3460 | 59.4398 | 49.0483 | 59.2127 | 59.2088 | 18.9282 |
0.4479 | 0.4138 | 3000 | 0.3214 | 64.1438 | 54.2258 | 63.8836 | 63.8605 | 19.1916 |
0.3963 | 0.5517 | 4000 | 0.2981 | 65.6803 | 56.0098 | 65.4391 | 65.4087 | 19.1486 |
0.3771 | 0.6897 | 5000 | 0.2828 | 67.1769 | 57.8865 | 66.9535 | 66.9212 | 19.1732 |
0.3478 | 0.8276 | 6000 | 0.2723 | 67.2249 | 58.1005 | 67.0126 | 66.9823 | 19.1721 |
0.3332 | 0.9655 | 7000 | 0.2563 | 67.6065 | 58.7448 | 67.4238 | 67.3956 | 19.1829 |
0.2553 | 1.1034 | 8000 | 0.2752 | 66.6302 | 57.9725 | 66.4682 | 66.4449 | 19.1119 |
0.2396 | 1.2414 | 9000 | 0.2604 | 67.9608 | 59.219 | 67.7935 | 67.76 | 19.1086 |
0.2369 | 1.3793 | 10000 | 0.2555 | 68.1951 | 59.7011 | 68.0417 | 68.0072 | 19.1754 |
0.2294 | 1.5172 | 11000 | 0.2508 | 68.0577 | 59.5206 | 67.8871 | 67.8552 | 19.1344 |
0.2265 | 1.6552 | 12000 | 0.2424 | 67.4119 | 58.7834 | 67.1999 | 67.1837 | 19.1054 |
0.2228 | 1.7931 | 13000 | 0.2383 | 67.8581 | 59.4302 | 67.7221 | 67.6929 | 19.1351 |
0.2216 | 1.9310 | 14000 | 0.2391 | 69.1897 | 61.0788 | 69.0352 | 69.0068 | 19.1914 |
0.1769 | 2.0690 | 15000 | 0.2494 | 69.458 | 61.4372 | 69.3297 | 69.3115 | 19.1916 |
0.153 | 2.2069 | 16000 | 0.2477 | 69.357 | 61.455 | 69.2285 | 69.1898 | 19.1746 |
0.1512 | 2.3448 | 17000 | 0.2619 | 68.6959 | 60.5759 | 68.5428 | 68.5083 | 19.1279 |
0.1522 | 2.4828 | 18000 | 0.2530 | 69.2438 | 61.3816 | 69.1173 | 69.0815 | 19.1781 |
0.1563 | 2.6207 | 19000 | 0.2466 | 68.6796 | 60.6531 | 68.5412 | 68.4936 | 19.1156 |
0.1507 | 2.7586 | 20000 | 0.2508 | 69.8035 | 61.9634 | 69.683 | 69.6408 | 19.1696 |
0.1533 | 2.8966 | 21000 | 0.2517 | 70.4486 | 62.8697 | 70.3275 | 70.2894 | 19.1685 |
0.1369 | 3.0345 | 22000 | 0.2619 | 70.0921 | 62.4997 | 69.979 | 69.9386 | 19.1621 |
0.1007 | 3.1724 | 23000 | 0.2644 | 69.8367 | 62.0731 | 69.7019 | 69.6681 | 19.1614 |
0.104 | 3.3103 | 24000 | 0.2652 | 69.9394 | 62.3593 | 69.8122 | 69.7678 | 19.1571 |
0.1068 | 3.4483 | 25000 | 0.2655 | 70.1364 | 62.553 | 70.0215 | 69.9713 | 19.1851 |
0.1106 | 3.5862 | 26000 | 0.2591 | 69.9154 | 62.1538 | 69.7887 | 69.7509 | 19.1754 |
0.1085 | 3.7241 | 27000 | 0.2687 | 70.1055 | 62.5805 | 69.9928 | 69.9442 | 19.1585 |
0.1099 | 3.8621 | 28000 | 0.2568 | 69.7394 | 61.9854 | 69.6332 | 69.5986 | 19.1552 |
0.1087 | 4.0 | 29000 | 0.2615 | 70.1983 | 62.6796 | 70.0887 | 70.058 | 19.1558 |
0.0686 | 4.1379 | 30000 | 0.2742 | 69.0479 | 61.407 | 68.9306 | 68.9027 | 19.1205 |
0.0746 | 4.2759 | 31000 | 0.2689 | 70.4406 | 62.9053 | 70.3163 | 70.2855 | 19.1558 |
0.0758 | 4.4138 | 32000 | 0.2777 | 70.3994 | 62.855 | 70.287 | 70.2398 | 19.163 |
0.0772 | 4.5517 | 33000 | 0.2726 | 70.2671 | 62.757 | 70.1461 | 70.1029 | 19.1535 |
0.0788 | 4.6897 | 34000 | 0.2770 | 69.5882 | 61.8849 | 69.4835 | 69.4429 | 19.1656 |
0.0797 | 4.8276 | 35000 | 0.2764 | 69.9145 | 62.404 | 69.789 | 69.7598 | 19.1639 |
0.079 | 4.9655 | 36000 | 0.2747 | 69.4204 | 61.8999 | 69.3158 | 69.2775 | 19.1845 |
0.0587 | 5.1034 | 37000 | 0.2895 | 70.3877 | 63.0522 | 70.2784 | 70.235 | 19.1519 |
0.0527 | 5.2414 | 38000 | 0.2862 | 69.4405 | 61.8645 | 69.3228 | 69.28 | 19.1638 |
0.0533 | 5.3793 | 39000 | 0.2888 | 69.3483 | 61.8568 | 69.2428 | 69.2095 | 19.1431 |
0.0545 | 5.5172 | 40000 | 0.2946 | 68.2516 | 60.621 | 68.1208 | 68.0861 | 19.1365 |
0.0572 | 5.6552 | 41000 | 0.3024 | 69.4355 | 61.9683 | 69.3316 | 69.2974 | 19.1386 |
0.0592 | 5.7931 | 42000 | 0.2909 | 69.2934 | 61.6953 | 69.181 | 69.1549 | 19.158 |
0.06 | 5.9310 | 43000 | 0.2912 | 68.983 | 61.4127 | 68.8525 | 68.8369 | 19.1446 |
0.0473 | 6.0690 | 44000 | 0.3238 | 70.1649 | 62.7399 | 70.0521 | 70.0103 | 19.1725 |
0.0377 | 6.2069 | 45000 | 0.2999 | 67.2192 | 59.4114 | 67.0974 | 67.0716 | 19.1046 |
0.0407 | 6.3448 | 46000 | 0.3181 | 67.2047 | 59.6483 | 67.0705 | 67.046 | 19.2064 |
0.0404 | 6.4828 | 47000 | 0.3069 | 67.5288 | 60.02 | 67.424 | 67.3754 | 19.1977 |
0.0431 | 6.6207 | 48000 | 0.3144 | 69.8076 | 62.3682 | 69.6792 | 69.6511 | 19.1865 |
0.0427 | 6.7586 | 49000 | 0.3154 | 69.2514 | 61.7694 | 69.1338 | 69.098 | 19.2413 |
0.0438 | 6.8966 | 50000 | 0.3042 | 69.8743 | 62.4372 | 69.7374 | 69.6987 | 19.1699 |
0.0395 | 7.0345 | 51000 | 0.3258 | 70.7026 | 63.616 | 70.6003 | 70.5601 | 19.184 |
0.0284 | 7.1724 | 52000 | 0.3281 | 70.2281 | 62.9315 | 70.1089 | 70.0755 | 19.1556 |
0.0295 | 7.3103 | 53000 | 0.3351 | 70.5253 | 63.3491 | 70.4193 | 70.3758 | 19.1659 |
0.0296 | 7.4483 | 54000 | 0.3382 | 70.799 | 63.6006 | 70.6772 | 70.6426 | 19.1755 |
0.0309 | 7.5862 | 55000 | 0.3292 | 69.4402 | 62.1832 | 69.3146 | 69.2855 | 19.18 |
0.0309 | 7.7241 | 56000 | 0.3358 | 70.628 | 63.4921 | 70.5047 | 70.4596 | 19.1676 |
0.0301 | 7.8621 | 57000 | 0.3433 | 70.4413 | 63.2207 | 70.3199 | 70.2882 | 19.1719 |
0.032 | 8.0 | 58000 | 0.3228 | 70.5812 | 63.4071 | 70.4684 | 70.4319 | 19.1773 |
0.0209 | 8.1379 | 59000 | 0.3496 | 70.4711 | 63.1854 | 70.3596 | 70.3184 | 19.1645 |
0.0201 | 8.2759 | 60000 | 0.3559 | 70.454 | 63.2365 | 70.3396 | 70.3065 | 19.18 |
0.0216 | 8.4138 | 61000 | 0.3452 | 70.7055 | 63.551 | 70.5961 | 70.5624 | 19.1603 |
0.0224 | 8.5517 | 62000 | 0.3425 | 70.6933 | 63.5627 | 70.5959 | 70.5621 | 19.1645 |
0.0228 | 8.6897 | 63000 | 0.3489 | 70.329 | 63.2136 | 70.2368 | 70.1908 | 19.1737 |
0.0231 | 8.8276 | 64000 | 0.3502 | 70.5163 | 63.3691 | 70.4183 | 70.3802 | 19.1631 |
0.0226 | 8.9655 | 65000 | 0.3495 | 70.3915 | 63.1873 | 70.2895 | 70.2553 | 19.1594 |
0.0177 | 9.1034 | 66000 | 0.3665 | 70.3584 | 63.2121 | 70.264 | 70.2218 | 19.1728 |
0.0155 | 9.2414 | 67000 | 0.3668 | 70.4038 | 63.2343 | 70.3021 | 70.2646 | 19.1545 |
0.0156 | 9.3793 | 68000 | 0.3858 | 70.5827 | 63.3876 | 70.4804 | 70.4326 | 19.1588 |
0.0164 | 9.5172 | 69000 | 0.3644 | 70.9777 | 63.9787 | 70.8733 | 70.8334 | 19.1545 |
0.0169 | 9.6552 | 70000 | 0.3693 | 70.7079 | 63.6767 | 70.6034 | 70.5633 | 19.1623 |
0.0169 | 9.7931 | 71000 | 0.3734 | 70.5823 | 63.4672 | 70.4804 | 70.4471 | 19.1618 |
0.0179 | 9.9310 | 72000 | 0.3649 | 70.6698 | 63.6145 | 70.5783 | 70.5333 | 19.157 |
0.0135 | 10.0690 | 73000 | 0.3920 | 70.6779 | 63.708 | 70.5869 | 70.5515 | 19.1776 |
0.011 | 10.2069 | 74000 | 0.3957 | 70.698 | 63.5898 | 70.5937 | 70.5593 | 19.1579 |
0.0112 | 10.3448 | 75000 | 0.3880 | 70.7058 | 63.6789 | 70.6075 | 70.5706 | 19.169 |
0.0124 | 10.4828 | 76000 | 0.3868 | 70.8377 | 63.8888 | 70.7722 | 70.7271 | 19.1614 |
0.0117 | 10.6207 | 77000 | 0.3835 | 70.225 | 63.1845 | 70.1412 | 70.103 | 19.1719 |
0.0118 | 10.7586 | 78000 | 0.3999 | 70.2805 | 63.1728 | 70.1672 | 70.1469 | 19.1835 |
0.0118 | 10.8966 | 79000 | 0.3967 | 69.99 | 62.9694 | 69.9026 | 69.8748 | 19.1981 |
0.0104 | 11.0345 | 80000 | 0.4053 | 70.2508 | 63.3003 | 70.1657 | 70.1329 | 19.2093 |
0.0084 | 11.1724 | 81000 | 0.4122 | 70.0517 | 63.0268 | 69.9457 | 69.9151 | 19.1805 |
0.0086 | 11.3103 | 82000 | 0.4025 | 70.0812 | 63.1975 | 69.9947 | 69.9587 | 19.1872 |
0.0082 | 11.4483 | 83000 | 0.4236 | 70.7138 | 63.7789 | 70.6047 | 70.5832 | 19.186 |
0.0082 | 11.5862 | 84000 | 0.4277 | 70.336 | 63.4501 | 70.2562 | 70.2147 | 19.177 |
0.0081 | 11.7241 | 85000 | 0.4084 | 69.918 | 62.8952 | 69.827 | 69.7926 | 19.1648 |
0.0085 | 11.8621 | 86000 | 0.4193 | 70.4653 | 63.5381 | 70.3767 | 70.3355 | 19.1803 |
0.0078 | 12.0 | 87000 | 0.4321 | 69.7744 | 62.7571 | 69.6712 | 69.6459 | 19.1737 |
0.0054 | 12.1379 | 88000 | 0.4411 | 69.9454 | 62.9937 | 69.8503 | 69.8164 | 19.1717 |
0.0056 | 12.2759 | 89000 | 0.4416 | 70.563 | 63.6044 | 70.4674 | 70.4182 | 19.1784 |
0.0058 | 12.4138 | 90000 | 0.4375 | 69.9383 | 62.9516 | 69.8393 | 69.8138 | 19.191 |
0.0057 | 12.5517 | 91000 | 0.4402 | 69.9012 | 62.9782 | 69.7933 | 69.7575 | 19.1724 |
0.005 | 12.6897 | 92000 | 0.4438 | 69.4525 | 62.4818 | 69.357 | 69.3203 | 19.1685 |
0.0058 | 12.8276 | 93000 | 0.4474 | 70.2012 | 63.2868 | 70.1103 | 70.0678 | 19.1769 |
0.0051 | 12.9655 | 94000 | 0.4434 | 69.8342 | 62.9167 | 69.7348 | 69.704 | 19.178 |
0.0045 | 13.1034 | 95000 | 0.4649 | 69.9998 | 63.0079 | 69.9039 | 69.8638 | 19.1893 |
0.0038 | 13.2414 | 96000 | 0.4640 | 70.2866 | 63.3656 | 70.1951 | 70.1636 | 19.1724 |
0.0039 | 13.3793 | 97000 | 0.4702 | 70.3794 | 63.5196 | 70.2901 | 70.2546 | 19.1814 |
0.0042 | 13.5172 | 98000 | 0.4718 | 70.6757 | 63.8287 | 70.5752 | 70.5308 | 19.1701 |
0.0039 | 13.6552 | 99000 | 0.4816 | 70.4003 | 63.5464 | 70.3017 | 70.267 | 19.1517 |
0.0037 | 13.7931 | 100000 | 0.4718 | 70.3157 | 63.4538 | 70.2295 | 70.1942 | 19.1746 |
0.0039 | 13.9310 | 101000 | 0.4643 | 70.5475 | 63.6863 | 70.4635 | 70.4256 | 19.1876 |
0.0031 | 14.0690 | 102000 | 0.4908 | 70.3373 | 63.4716 | 70.2509 | 70.2061 | 19.174 |
0.0028 | 14.2069 | 103000 | 0.4915 | 70.5516 | 63.6897 | 70.4538 | 70.4149 | 19.1822 |
0.0027 | 14.3448 | 104000 | 0.5110 | 70.5811 | 63.7133 | 70.4927 | 70.4498 | 19.1751 |
0.0025 | 14.4828 | 105000 | 0.4974 | 70.3384 | 63.4383 | 70.2609 | 70.2126 | 19.1769 |
0.0026 | 14.6207 | 106000 | 0.5010 | 70.7552 | 63.91 | 70.6802 | 70.6254 | 19.1729 |
0.0029 | 14.7586 | 107000 | 0.4989 | 70.78 | 63.9408 | 70.6996 | 70.6527 | 19.1654 |
0.0023 | 14.8966 | 108000 | 0.5118 | 70.8186 | 64.0192 | 70.7348 | 70.691 | 19.1702 |
0.0028 | 15.0345 | 109000 | 0.5058 | 70.8076 | 63.9936 | 70.7282 | 70.6829 | 19.1612 |
0.0021 | 15.1724 | 110000 | 0.5094 | 70.5992 | 63.7622 | 70.5161 | 70.4735 | 19.1668 |
0.002 | 15.3103 | 111000 | 0.5148 | 70.6299 | 63.8373 | 70.5413 | 70.4962 | 19.1736 |
0.002 | 15.4483 | 112000 | 0.5197 | 70.6815 | 63.9035 | 70.5942 | 70.5562 | 19.1678 |
0.0018 | 15.5862 | 113000 | 0.5218 | 70.6869 | 63.9007 | 70.606 | 70.5618 | 19.1734 |
0.0019 | 15.7241 | 114000 | 0.5232 | 70.6718 | 63.8988 | 70.5768 | 70.5393 | 19.1671 |
0.0019 | 15.8621 | 115000 | 0.5242 | 70.6659 | 63.9002 | 70.5822 | 70.5401 | 19.1739 |
0.0021 | 16.0 | 116000 | 0.5243 | 70.6792 | 63.9083 | 70.5913 | 70.5479 | 19.1747 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for fresst/mt5-base-b8-e16-t58k-jupyter
Base model
google/mt5-base