ak2603's picture
Fine tuning with synthetic data and translated
07119f3 verified
metadata
library_name: transformers
license: apache-2.0
base_model: google/mt5-small
tags:
  - summarization
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: mt5-small-synthetic-data-plus-translated
    results: []

mt5-small-synthetic-data-plus-translated

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5891
  • Rouge1: 0.6390
  • Rouge2: 0.5109
  • Rougel: 0.6157
  • Rougelsum: 0.6175

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.6e-05
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
14.4747 1.0 100 4.4435 0.0225 0.0054 0.0205 0.0215
5.9023 2.0 200 1.9711 0.1865 0.0791 0.1562 0.1567
3.0374 3.0 300 1.3288 0.3668 0.2195 0.3565 0.3567
2.1905 4.0 400 1.1478 0.4430 0.2741 0.4186 0.4205
1.8996 5.0 500 1.0408 0.4754 0.3275 0.4564 0.4574
1.6959 6.0 600 0.9541 0.5463 0.3972 0.5258 0.5273
1.5593 7.0 700 0.8942 0.5594 0.4138 0.5406 0.5426
1.4334 8.0 800 0.8482 0.6064 0.4683 0.5855 0.5866
1.3929 9.0 900 0.8106 0.6130 0.4714 0.5895 0.5911
1.2918 10.0 1000 0.7851 0.6156 0.4770 0.5929 0.5935
1.2362 11.0 1100 0.7576 0.6270 0.4894 0.6054 0.6060
1.1781 12.0 1200 0.7402 0.6257 0.4867 0.6031 0.6042
1.1476 13.0 1300 0.7212 0.6221 0.4894 0.6018 0.6029
1.1052 14.0 1400 0.7064 0.6214 0.4873 0.5983 0.5995
1.0667 15.0 1500 0.6938 0.6300 0.4972 0.6073 0.6079
1.0421 16.0 1600 0.6855 0.6265 0.4952 0.6026 0.6036
1.0169 17.0 1700 0.6748 0.6244 0.4911 0.6021 0.6029
1.0036 18.0 1800 0.6599 0.6342 0.5087 0.6130 0.6142
0.9828 19.0 1900 0.6510 0.6349 0.5090 0.6136 0.6147
0.9589 20.0 2000 0.6471 0.6370 0.5074 0.6124 0.6135
0.9267 21.0 2100 0.6400 0.6345 0.5081 0.6117 0.6127
0.9361 22.0 2200 0.6318 0.6336 0.5066 0.6126 0.6140
0.8992 23.0 2300 0.6291 0.6346 0.5066 0.6122 0.6125
0.9029 24.0 2400 0.6224 0.6367 0.5103 0.6152 0.6166
0.8815 25.0 2500 0.6159 0.6374 0.5078 0.6141 0.6157
0.8914 26.0 2600 0.6133 0.6356 0.5109 0.6120 0.6138
0.8548 27.0 2700 0.6091 0.6371 0.5089 0.6125 0.6145
0.8683 28.0 2800 0.6047 0.6387 0.5131 0.6149 0.6169
0.8483 29.0 2900 0.6020 0.6368 0.5096 0.6121 0.6133
0.8409 30.0 3000 0.5996 0.6405 0.5118 0.6139 0.6159
0.8407 31.0 3100 0.5997 0.6398 0.5123 0.6159 0.6177
0.8338 32.0 3200 0.5970 0.6385 0.5096 0.6144 0.6164
0.801 33.0 3300 0.5947 0.6361 0.5078 0.6122 0.6141
0.833 34.0 3400 0.5941 0.6386 0.5111 0.6154 0.6172
0.7751 35.0 3500 0.5921 0.6368 0.5065 0.6129 0.6148
0.8281 36.0 3600 0.5906 0.6409 0.5125 0.6183 0.6199
0.7803 37.0 3700 0.5898 0.6377 0.5097 0.6143 0.6162
0.8139 38.0 3800 0.5896 0.6398 0.5116 0.6166 0.6185
0.7922 39.0 3900 0.5894 0.6388 0.5109 0.6156 0.6174
0.8269 40.0 4000 0.5891 0.6390 0.5109 0.6157 0.6175

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0