|
2023-10-11 09:57:56,378 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,380 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 09:57:56,380 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,381 MultiCorpus: 1085 train + 148 dev + 364 test sentences |
|
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator |
|
2023-10-11 09:57:56,381 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,381 Train: 1085 sentences |
|
2023-10-11 09:57:56,381 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 09:57:56,381 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,381 Training Params: |
|
2023-10-11 09:57:56,381 - learning_rate: "0.00016" |
|
2023-10-11 09:57:56,381 - mini_batch_size: "4" |
|
2023-10-11 09:57:56,381 - max_epochs: "10" |
|
2023-10-11 09:57:56,381 - shuffle: "True" |
|
2023-10-11 09:57:56,381 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,381 Plugins: |
|
2023-10-11 09:57:56,381 - TensorboardLogger |
|
2023-10-11 09:57:56,382 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 09:57:56,382 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,382 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 09:57:56,382 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 09:57:56,382 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,382 Computation: |
|
2023-10-11 09:57:56,382 - compute on device: cuda:0 |
|
2023-10-11 09:57:56,382 - embedding storage: none |
|
2023-10-11 09:57:56,382 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,382 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" |
|
2023-10-11 09:57:56,382 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,382 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:57:56,382 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 09:58:05,782 epoch 1 - iter 27/272 - loss 2.84961659 - time (sec): 9.40 - samples/sec: 576.00 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 09:58:14,469 epoch 1 - iter 54/272 - loss 2.83915292 - time (sec): 18.08 - samples/sec: 545.61 - lr: 0.000031 - momentum: 0.000000 |
|
2023-10-11 09:58:23,736 epoch 1 - iter 81/272 - loss 2.81803749 - time (sec): 27.35 - samples/sec: 553.45 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 09:58:33,589 epoch 1 - iter 108/272 - loss 2.74804522 - time (sec): 37.20 - samples/sec: 564.68 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 09:58:42,886 epoch 1 - iter 135/272 - loss 2.65294364 - time (sec): 46.50 - samples/sec: 566.48 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-11 09:58:51,421 epoch 1 - iter 162/272 - loss 2.56578574 - time (sec): 55.04 - samples/sec: 556.46 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 09:59:00,707 epoch 1 - iter 189/272 - loss 2.45465098 - time (sec): 64.32 - samples/sec: 554.14 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-11 09:59:09,883 epoch 1 - iter 216/272 - loss 2.33901060 - time (sec): 73.50 - samples/sec: 554.24 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 09:59:19,912 epoch 1 - iter 243/272 - loss 2.19029882 - time (sec): 83.53 - samples/sec: 558.64 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 09:59:29,206 epoch 1 - iter 270/272 - loss 2.07205638 - time (sec): 92.82 - samples/sec: 559.30 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 09:59:29,534 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:59:29,535 EPOCH 1 done: loss 2.0705 - lr: 0.000158 |
|
2023-10-11 09:59:34,650 DEV : loss 0.7345565557479858 - f1-score (micro avg) 0.0 |
|
2023-10-11 09:59:34,658 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:59:44,295 epoch 2 - iter 27/272 - loss 0.70414994 - time (sec): 9.63 - samples/sec: 601.26 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 09:59:53,165 epoch 2 - iter 54/272 - loss 0.63540563 - time (sec): 18.50 - samples/sec: 581.59 - lr: 0.000157 - momentum: 0.000000 |
|
2023-10-11 10:00:02,825 epoch 2 - iter 81/272 - loss 0.63286981 - time (sec): 28.16 - samples/sec: 590.61 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-11 10:00:12,024 epoch 2 - iter 108/272 - loss 0.60068837 - time (sec): 37.36 - samples/sec: 583.88 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-11 10:00:21,275 epoch 2 - iter 135/272 - loss 0.58169819 - time (sec): 46.61 - samples/sec: 578.50 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-11 10:00:29,801 epoch 2 - iter 162/272 - loss 0.55542542 - time (sec): 55.14 - samples/sec: 569.54 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 10:00:38,726 epoch 2 - iter 189/272 - loss 0.54212044 - time (sec): 64.07 - samples/sec: 563.12 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 10:00:48,281 epoch 2 - iter 216/272 - loss 0.51858578 - time (sec): 73.62 - samples/sec: 561.94 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-11 10:00:57,669 epoch 2 - iter 243/272 - loss 0.50379454 - time (sec): 83.01 - samples/sec: 557.51 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-11 10:01:08,182 epoch 2 - iter 270/272 - loss 0.49383828 - time (sec): 93.52 - samples/sec: 554.15 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 10:01:08,590 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:01:08,590 EPOCH 2 done: loss 0.4930 - lr: 0.000142 |
|
2023-10-11 10:01:14,578 DEV : loss 0.3049907982349396 - f1-score (micro avg) 0.2867 |
|
2023-10-11 10:01:14,589 saving best model |
|
2023-10-11 10:01:15,564 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:01:26,175 epoch 3 - iter 27/272 - loss 0.36825189 - time (sec): 10.61 - samples/sec: 495.68 - lr: 0.000141 - momentum: 0.000000 |
|
2023-10-11 10:01:36,526 epoch 3 - iter 54/272 - loss 0.35828698 - time (sec): 20.96 - samples/sec: 489.94 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-11 10:01:46,586 epoch 3 - iter 81/272 - loss 0.33371894 - time (sec): 31.02 - samples/sec: 489.93 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 10:01:56,178 epoch 3 - iter 108/272 - loss 0.32842770 - time (sec): 40.61 - samples/sec: 500.40 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 10:02:06,182 epoch 3 - iter 135/272 - loss 0.32494522 - time (sec): 50.62 - samples/sec: 513.93 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 10:02:15,811 epoch 3 - iter 162/272 - loss 0.31495256 - time (sec): 60.24 - samples/sec: 514.72 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 10:02:26,550 epoch 3 - iter 189/272 - loss 0.31307865 - time (sec): 70.98 - samples/sec: 525.09 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 10:02:36,602 epoch 3 - iter 216/272 - loss 0.30261968 - time (sec): 81.04 - samples/sec: 527.34 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 10:02:45,602 epoch 3 - iter 243/272 - loss 0.30154536 - time (sec): 90.04 - samples/sec: 520.58 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 10:02:55,043 epoch 3 - iter 270/272 - loss 0.29990213 - time (sec): 99.48 - samples/sec: 519.97 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 10:02:55,531 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:02:55,531 EPOCH 3 done: loss 0.3001 - lr: 0.000125 |
|
2023-10-11 10:03:01,411 DEV : loss 0.23118416965007782 - f1-score (micro avg) 0.4514 |
|
2023-10-11 10:03:01,419 saving best model |
|
2023-10-11 10:03:03,968 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:03:13,204 epoch 4 - iter 27/272 - loss 0.26389527 - time (sec): 9.23 - samples/sec: 543.60 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 10:03:22,197 epoch 4 - iter 54/272 - loss 0.23946997 - time (sec): 18.22 - samples/sec: 524.90 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 10:03:32,348 epoch 4 - iter 81/272 - loss 0.22494046 - time (sec): 28.37 - samples/sec: 549.85 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 10:03:42,207 epoch 4 - iter 108/272 - loss 0.22128238 - time (sec): 38.23 - samples/sec: 551.86 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 10:03:51,557 epoch 4 - iter 135/272 - loss 0.21790054 - time (sec): 47.58 - samples/sec: 546.55 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-11 10:04:01,550 epoch 4 - iter 162/272 - loss 0.21355483 - time (sec): 57.58 - samples/sec: 549.19 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 10:04:10,733 epoch 4 - iter 189/272 - loss 0.21461498 - time (sec): 66.76 - samples/sec: 543.62 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 10:04:20,210 epoch 4 - iter 216/272 - loss 0.21161711 - time (sec): 76.24 - samples/sec: 543.55 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 10:04:29,699 epoch 4 - iter 243/272 - loss 0.21439789 - time (sec): 85.72 - samples/sec: 544.83 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 10:04:38,989 epoch 4 - iter 270/272 - loss 0.21064388 - time (sec): 95.01 - samples/sec: 545.07 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 10:04:39,417 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:04:39,418 EPOCH 4 done: loss 0.2106 - lr: 0.000107 |
|
2023-10-11 10:04:45,299 DEV : loss 0.17590400576591492 - f1-score (micro avg) 0.5839 |
|
2023-10-11 10:04:45,309 saving best model |
|
2023-10-11 10:04:47,847 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:04:56,859 epoch 5 - iter 27/272 - loss 0.16095320 - time (sec): 9.01 - samples/sec: 518.32 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 10:05:06,266 epoch 5 - iter 54/272 - loss 0.15303063 - time (sec): 18.41 - samples/sec: 545.17 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 10:05:15,581 epoch 5 - iter 81/272 - loss 0.14715890 - time (sec): 27.73 - samples/sec: 553.13 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-11 10:05:24,566 epoch 5 - iter 108/272 - loss 0.14859354 - time (sec): 36.71 - samples/sec: 551.36 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 10:05:33,383 epoch 5 - iter 135/272 - loss 0.13938521 - time (sec): 45.53 - samples/sec: 549.22 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 10:05:43,089 epoch 5 - iter 162/272 - loss 0.13872093 - time (sec): 55.24 - samples/sec: 559.36 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-11 10:05:52,177 epoch 5 - iter 189/272 - loss 0.14314448 - time (sec): 64.33 - samples/sec: 558.08 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-11 10:06:01,633 epoch 5 - iter 216/272 - loss 0.14519265 - time (sec): 73.78 - samples/sec: 559.16 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 10:06:11,241 epoch 5 - iter 243/272 - loss 0.14574570 - time (sec): 83.39 - samples/sec: 559.96 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 10:06:20,699 epoch 5 - iter 270/272 - loss 0.14307060 - time (sec): 92.85 - samples/sec: 556.93 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 10:06:21,219 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:06:21,219 EPOCH 5 done: loss 0.1430 - lr: 0.000089 |
|
2023-10-11 10:06:26,814 DEV : loss 0.15896683931350708 - f1-score (micro avg) 0.6123 |
|
2023-10-11 10:06:26,824 saving best model |
|
2023-10-11 10:06:29,363 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:06:39,090 epoch 6 - iter 27/272 - loss 0.12861859 - time (sec): 9.72 - samples/sec: 563.75 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 10:06:48,027 epoch 6 - iter 54/272 - loss 0.12625714 - time (sec): 18.66 - samples/sec: 546.90 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 10:06:57,647 epoch 6 - iter 81/272 - loss 0.11812378 - time (sec): 28.28 - samples/sec: 551.60 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 10:07:07,496 epoch 6 - iter 108/272 - loss 0.11590135 - time (sec): 38.13 - samples/sec: 563.80 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 10:07:16,590 epoch 6 - iter 135/272 - loss 0.11257005 - time (sec): 47.22 - samples/sec: 548.24 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 10:07:26,339 epoch 6 - iter 162/272 - loss 0.10607560 - time (sec): 56.97 - samples/sec: 553.59 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 10:07:35,893 epoch 6 - iter 189/272 - loss 0.10194287 - time (sec): 66.53 - samples/sec: 550.51 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 10:07:45,318 epoch 6 - iter 216/272 - loss 0.10614524 - time (sec): 75.95 - samples/sec: 547.98 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 10:07:54,921 epoch 6 - iter 243/272 - loss 0.10334886 - time (sec): 85.55 - samples/sec: 547.68 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 10:08:04,092 epoch 6 - iter 270/272 - loss 0.10347814 - time (sec): 94.72 - samples/sec: 546.02 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 10:08:04,598 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:08:04,599 EPOCH 6 done: loss 0.1033 - lr: 0.000071 |
|
2023-10-11 10:08:10,290 DEV : loss 0.1453726589679718 - f1-score (micro avg) 0.6964 |
|
2023-10-11 10:08:10,299 saving best model |
|
2023-10-11 10:08:12,837 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:08:22,005 epoch 7 - iter 27/272 - loss 0.08200013 - time (sec): 9.16 - samples/sec: 526.74 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 10:08:31,942 epoch 7 - iter 54/272 - loss 0.07169928 - time (sec): 19.10 - samples/sec: 549.62 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 10:08:41,790 epoch 7 - iter 81/272 - loss 0.06750559 - time (sec): 28.95 - samples/sec: 552.18 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-11 10:08:51,214 epoch 7 - iter 108/272 - loss 0.06876153 - time (sec): 38.37 - samples/sec: 549.77 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 10:09:01,123 epoch 7 - iter 135/272 - loss 0.07382666 - time (sec): 48.28 - samples/sec: 549.32 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 10:09:10,687 epoch 7 - iter 162/272 - loss 0.07376310 - time (sec): 57.85 - samples/sec: 542.87 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 10:09:19,909 epoch 7 - iter 189/272 - loss 0.07298171 - time (sec): 67.07 - samples/sec: 533.52 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 10:09:30,231 epoch 7 - iter 216/272 - loss 0.07154073 - time (sec): 77.39 - samples/sec: 535.81 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 10:09:39,463 epoch 7 - iter 243/272 - loss 0.07659368 - time (sec): 86.62 - samples/sec: 535.04 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 10:09:49,375 epoch 7 - iter 270/272 - loss 0.07547255 - time (sec): 96.53 - samples/sec: 536.22 - lr: 0.000054 - momentum: 0.000000 |
|
2023-10-11 10:09:49,841 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:09:49,841 EPOCH 7 done: loss 0.0753 - lr: 0.000054 |
|
2023-10-11 10:09:55,608 DEV : loss 0.1472298800945282 - f1-score (micro avg) 0.7478 |
|
2023-10-11 10:09:55,617 saving best model |
|
2023-10-11 10:09:58,161 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:10:07,364 epoch 8 - iter 27/272 - loss 0.07327268 - time (sec): 9.20 - samples/sec: 499.40 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 10:10:16,567 epoch 8 - iter 54/272 - loss 0.05912708 - time (sec): 18.40 - samples/sec: 510.50 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 10:10:26,716 epoch 8 - iter 81/272 - loss 0.06315843 - time (sec): 28.55 - samples/sec: 538.72 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 10:10:36,403 epoch 8 - iter 108/272 - loss 0.06440136 - time (sec): 38.24 - samples/sec: 540.82 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-11 10:10:45,979 epoch 8 - iter 135/272 - loss 0.06543166 - time (sec): 47.81 - samples/sec: 541.26 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 10:10:54,660 epoch 8 - iter 162/272 - loss 0.06830101 - time (sec): 56.49 - samples/sec: 531.16 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 10:11:04,624 epoch 8 - iter 189/272 - loss 0.06687904 - time (sec): 66.46 - samples/sec: 535.98 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-11 10:11:15,233 epoch 8 - iter 216/272 - loss 0.06417230 - time (sec): 77.07 - samples/sec: 545.82 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 10:11:24,573 epoch 8 - iter 243/272 - loss 0.06203206 - time (sec): 86.41 - samples/sec: 541.39 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 10:11:34,240 epoch 8 - iter 270/272 - loss 0.05987987 - time (sec): 96.07 - samples/sec: 539.08 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-11 10:11:34,685 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:11:34,686 EPOCH 8 done: loss 0.0599 - lr: 0.000036 |
|
2023-10-11 10:11:40,788 DEV : loss 0.1424403041601181 - f1-score (micro avg) 0.7653 |
|
2023-10-11 10:11:40,796 saving best model |
|
2023-10-11 10:11:43,341 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:11:51,856 epoch 9 - iter 27/272 - loss 0.07666643 - time (sec): 8.51 - samples/sec: 504.34 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 10:12:01,071 epoch 9 - iter 54/272 - loss 0.07280220 - time (sec): 17.73 - samples/sec: 533.59 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 10:12:10,786 epoch 9 - iter 81/272 - loss 0.05972387 - time (sec): 27.44 - samples/sec: 527.72 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 10:12:20,137 epoch 9 - iter 108/272 - loss 0.06294634 - time (sec): 36.79 - samples/sec: 531.16 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 10:12:29,121 epoch 9 - iter 135/272 - loss 0.06234610 - time (sec): 45.78 - samples/sec: 522.90 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 10:12:39,294 epoch 9 - iter 162/272 - loss 0.05914625 - time (sec): 55.95 - samples/sec: 534.53 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 10:12:48,743 epoch 9 - iter 189/272 - loss 0.05774082 - time (sec): 65.40 - samples/sec: 535.18 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 10:12:58,443 epoch 9 - iter 216/272 - loss 0.05539923 - time (sec): 75.10 - samples/sec: 537.09 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 10:13:08,510 epoch 9 - iter 243/272 - loss 0.05258815 - time (sec): 85.16 - samples/sec: 543.20 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 10:13:18,255 epoch 9 - iter 270/272 - loss 0.05055787 - time (sec): 94.91 - samples/sec: 545.35 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 10:13:18,703 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:13:18,703 EPOCH 9 done: loss 0.0506 - lr: 0.000018 |
|
2023-10-11 10:13:24,318 DEV : loss 0.14669708907604218 - f1-score (micro avg) 0.7576 |
|
2023-10-11 10:13:24,327 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:13:34,114 epoch 10 - iter 27/272 - loss 0.05704288 - time (sec): 9.79 - samples/sec: 547.25 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 10:13:43,165 epoch 10 - iter 54/272 - loss 0.06282741 - time (sec): 18.84 - samples/sec: 532.27 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 10:13:52,252 epoch 10 - iter 81/272 - loss 0.05470713 - time (sec): 27.92 - samples/sec: 535.00 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 10:14:01,156 epoch 10 - iter 108/272 - loss 0.05095954 - time (sec): 36.83 - samples/sec: 534.87 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-11 10:14:11,443 epoch 10 - iter 135/272 - loss 0.04981593 - time (sec): 47.11 - samples/sec: 550.36 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-11 10:14:20,845 epoch 10 - iter 162/272 - loss 0.04680562 - time (sec): 56.52 - samples/sec: 545.26 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 10:14:30,321 epoch 10 - iter 189/272 - loss 0.04579138 - time (sec): 65.99 - samples/sec: 545.81 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 10:14:39,821 epoch 10 - iter 216/272 - loss 0.04474162 - time (sec): 75.49 - samples/sec: 548.68 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-11 10:14:49,718 epoch 10 - iter 243/272 - loss 0.04341634 - time (sec): 85.39 - samples/sec: 549.72 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 10:14:58,779 epoch 10 - iter 270/272 - loss 0.04483462 - time (sec): 94.45 - samples/sec: 547.28 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 10:14:59,278 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:14:59,278 EPOCH 10 done: loss 0.0447 - lr: 0.000000 |
|
2023-10-11 10:15:04,833 DEV : loss 0.14579297602176666 - f1-score (micro avg) 0.7607 |
|
2023-10-11 10:15:05,711 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:15:05,713 Loading model from best epoch ... |
|
2023-10-11 10:15:09,559 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG |
|
2023-10-11 10:15:22,453 |
|
Results: |
|
- F-score (micro) 0.7359 |
|
- F-score (macro) 0.6371 |
|
- Accuracy 0.6043 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.7330 0.8622 0.7923 312 |
|
PER 0.6784 0.8317 0.7473 208 |
|
ORG 0.3415 0.2545 0.2917 55 |
|
HumanProd 0.6129 0.8636 0.7170 22 |
|
|
|
micro avg 0.6844 0.7956 0.7359 597 |
|
macro avg 0.5914 0.7030 0.6371 597 |
|
weighted avg 0.6735 0.7956 0.7277 597 |
|
|
|
2023-10-11 10:15:22,454 ---------------------------------------------------------------------------------------------------- |
|
|