BluebrainAI
/

rotating-head-gp-gpt2-medium-wikitext

@@ -17,10 +17,10 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Accuracy: 0.4196
-- Bleu: 0.1339
-- Loss: 3.1985
-- Perplexity: 24.4954
 ## Model description
@@ -46,29 +46,33 @@ The following hyperparameters were used during training:
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 5
 ### Training results
-| Training Loss | Epoch  | Step | Accuracy | Bleu   | Validation Loss | Perplexity |
-|:-------------:|:------:|:----:|:--------:|:------:|:---------------:|:----------:|
-| 5.9062        | 0.2806 | 500  | 0.2234   | 0.0493 | 5.7470          | 313.2463   |
-| 4.8598        | 0.5612 | 1000 | 0.2811   | 0.0698 | 4.7428          | 114.7554   |
-| 4.3025        | 0.8418 | 1500 | 0.3170   | 0.0834 | 4.2329          | 68.9191    |
-| 3.9635        | 1.1223 | 2000 | 0.3454   | 0.0932 | 3.9291          | 50.8590    |
-| 3.7769        | 1.4029 | 2500 | 0.3636   | 0.1020 | 3.7427          | 42.2098    |
-| 3.6738        | 1.6835 | 3000 | 0.3754   | 0.1066 | 3.6225          | 37.4295    |
-| 3.5744        | 1.9641 | 3500 | 0.3845   | 0.1118 | 3.5325          | 34.2102    |
-| 3.456         | 2.2447 | 4000 | 0.3902   | 0.1139 | 3.4704          | 32.1497    |
-| 3.3972        | 2.5253 | 4500 | 0.3955   | 0.1230 | 3.4190          | 30.5384    |
-| 3.3654        | 2.8058 | 5000 | 0.4007   | 0.1230 | 3.3686          | 29.0392    |
-| 3.247         | 3.0864 | 5500 | 0.4043   | 0.1247 | 3.3328          | 28.0168    |
-| 3.2403        | 3.3670 | 6000 | 0.4083   | 0.1298 | 3.2985          | 27.0714    |
-| 3.2167        | 3.6476 | 6500 | 0.4112   | 0.1288 | 3.2693          | 26.2922    |
-| 3.1903        | 3.9282 | 7000 | 0.4134   | 0.1305 | 3.2456          | 25.6768    |
-| 3.1212        | 4.2088 | 7500 | 0.4161   | 0.1325 | 3.2262          | 25.1831    |
-| 3.0816        | 4.4893 | 8000 | 0.4176   | 0.1307 | 3.2128          | 24.8480    |
-| 3.0917        | 4.7699 | 8500 | 0.4196   | 0.1339 | 3.1985          | 24.4954    |
 ### Framework versions

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.1790
+- Accuracy: 0.4217
+- Perplexity: 24.0231
+- Bleu: 0.1309
 ## Model description
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 6
 ### Training results
+| Training Loss | Epoch  | Step  | Accuracy | Bleu   | Validation Loss | Perplexity |
+|:-------------:|:------:|:-----:|:--------:|:------:|:---------------:|:----------:|
+| 5.9062        | 0.2806 | 500   | 0.2234   | 0.0493 | 5.7470          | 313.2463   |
+| 4.8598        | 0.5612 | 1000  | 0.2811   | 0.0698 | 4.7428          | 114.7554   |
+| 4.3025        | 0.8418 | 1500  | 0.3170   | 0.0834 | 4.2329          | 68.9191    |
+| 3.9635        | 1.1223 | 2000  | 0.3454   | 0.0932 | 3.9291          | 50.8590    |
+| 3.7769        | 1.4029 | 2500  | 0.3636   | 0.1020 | 3.7427          | 42.2098    |
+| 3.6738        | 1.6835 | 3000  | 0.3754   | 0.1066 | 3.6225          | 37.4295    |
+| 3.5744        | 1.9641 | 3500  | 0.3845   | 0.1118 | 3.5325          | 34.2102    |
+| 3.456         | 2.2447 | 4000  | 0.3902   | 0.1139 | 3.4704          | 32.1497    |
+| 3.3972        | 2.5253 | 4500  | 0.3955   | 0.1230 | 3.4190          | 30.5384    |
+| 3.3654        | 2.8058 | 5000  | 0.4007   | 0.1230 | 3.3686          | 29.0392    |
+| 3.247         | 3.0864 | 5500  | 0.4043   | 0.1247 | 3.3328          | 28.0168    |
+| 3.2403        | 3.3670 | 6000  | 0.4083   | 0.1298 | 3.2985          | 27.0714    |
+| 3.2167        | 3.6476 | 6500  | 0.4112   | 0.1288 | 3.2693          | 26.2922    |
+| 3.1903        | 3.9282 | 7000  | 0.4134   | 0.1305 | 3.2456          | 25.6768    |
+| 3.1212        | 4.2088 | 7500  | 0.4161   | 0.1325 | 3.2262          | 25.1831    |
+| 3.0816        | 4.4893 | 8000  | 0.4176   | 0.1307 | 3.2128          | 24.8480    |
+| 3.0917        | 4.7699 | 8500  | 0.4196   | 0.1339 | 3.1985          | 24.4954    |
+| 3.0562        | 5.0505 | 9000  | 3.2049   | 0.4185 | 24.6521         | 0.1326     |
+| 3.0683        | 5.3311 | 9500  | 3.1970   | 0.4195 | 24.4597         | 0.1307     |
+| 3.0502        | 5.6117 | 10000 | 3.1857   | 0.4209 | 24.1847         | 0.1331     |
+| 3.0469        | 5.8923 | 10500 | 3.1790   | 0.4217 | 24.0231         | 0.1309     |
 ### Framework versions