2023-10-11 09:39:50,455 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,457 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 09:39:50,457 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,458 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-11 09:39:50,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,458 Train: 1085 sentences 2023-10-11 09:39:50,458 (train_with_dev=False, train_with_test=False) 2023-10-11 09:39:50,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,458 Training Params: 2023-10-11 09:39:50,458 - learning_rate: "0.00015" 2023-10-11 09:39:50,458 - mini_batch_size: "4" 2023-10-11 09:39:50,458 - max_epochs: "10" 2023-10-11 09:39:50,458 - shuffle: "True" 2023-10-11 09:39:50,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,459 Plugins: 2023-10-11 09:39:50,459 - TensorboardLogger 2023-10-11 09:39:50,459 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 09:39:50,459 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,459 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 09:39:50,459 - metric: "('micro avg', 'f1-score')" 2023-10-11 09:39:50,459 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,459 Computation: 2023-10-11 09:39:50,459 - compute on device: cuda:0 2023-10-11 09:39:50,459 - embedding storage: none 2023-10-11 09:39:50,459 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,459 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-11 09:39:50,459 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,459 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:39:50,460 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 09:40:00,360 epoch 1 - iter 27/272 - loss 2.84978598 - time (sec): 9.90 - samples/sec: 546.87 - lr: 0.000014 - momentum: 0.000000 2023-10-11 09:40:09,671 epoch 1 - iter 54/272 - loss 2.83993499 - time (sec): 19.21 - samples/sec: 513.66 - lr: 0.000029 - momentum: 0.000000 2023-10-11 09:40:19,438 epoch 1 - iter 81/272 - loss 2.82028870 - time (sec): 28.98 - samples/sec: 522.42 - lr: 0.000044 - momentum: 0.000000 2023-10-11 09:40:29,609 epoch 1 - iter 108/272 - loss 2.75600317 - time (sec): 39.15 - samples/sec: 536.66 - lr: 0.000059 - momentum: 0.000000 2023-10-11 09:40:39,237 epoch 1 - iter 135/272 - loss 2.66472267 - time (sec): 48.78 - samples/sec: 540.07 - lr: 0.000074 - momentum: 0.000000 2023-10-11 09:40:48,005 epoch 1 - iter 162/272 - loss 2.58070954 - time (sec): 57.54 - samples/sec: 532.22 - lr: 0.000089 - momentum: 0.000000 2023-10-11 09:40:57,344 epoch 1 - iter 189/272 - loss 2.47427847 - time (sec): 66.88 - samples/sec: 532.94 - lr: 0.000104 - momentum: 0.000000 2023-10-11 09:41:06,659 epoch 1 - iter 216/272 - loss 2.36367265 - time (sec): 76.20 - samples/sec: 534.61 - lr: 0.000119 - momentum: 0.000000 2023-10-11 09:41:17,028 epoch 1 - iter 243/272 - loss 2.22144061 - time (sec): 86.57 - samples/sec: 539.03 - lr: 0.000133 - momentum: 0.000000 2023-10-11 09:41:26,523 epoch 1 - iter 270/272 - loss 2.10748412 - time (sec): 96.06 - samples/sec: 540.43 - lr: 0.000148 - momentum: 0.000000 2023-10-11 09:41:26,853 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:41:26,853 EPOCH 1 done: loss 2.1060 - lr: 0.000148 2023-10-11 09:41:32,029 DEV : loss 0.8020414710044861 - f1-score (micro avg) 0.0 2023-10-11 09:41:32,037 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:41:42,064 epoch 2 - iter 27/272 - loss 0.76893037 - time (sec): 10.03 - samples/sec: 577.84 - lr: 0.000148 - momentum: 0.000000 2023-10-11 09:41:51,392 epoch 2 - iter 54/272 - loss 0.69007728 - time (sec): 19.35 - samples/sec: 556.09 - lr: 0.000147 - momentum: 0.000000 2023-10-11 09:42:01,289 epoch 2 - iter 81/272 - loss 0.67431997 - time (sec): 29.25 - samples/sec: 568.69 - lr: 0.000145 - momentum: 0.000000 2023-10-11 09:42:10,880 epoch 2 - iter 108/272 - loss 0.63443527 - time (sec): 38.84 - samples/sec: 561.67 - lr: 0.000143 - momentum: 0.000000 2023-10-11 09:42:20,670 epoch 2 - iter 135/272 - loss 0.61291708 - time (sec): 48.63 - samples/sec: 554.52 - lr: 0.000142 - momentum: 0.000000 2023-10-11 09:42:29,890 epoch 2 - iter 162/272 - loss 0.58952666 - time (sec): 57.85 - samples/sec: 542.86 - lr: 0.000140 - momentum: 0.000000 2023-10-11 09:42:39,479 epoch 2 - iter 189/272 - loss 0.57453427 - time (sec): 67.44 - samples/sec: 534.95 - lr: 0.000138 - momentum: 0.000000 2023-10-11 09:42:49,535 epoch 2 - iter 216/272 - loss 0.54891050 - time (sec): 77.50 - samples/sec: 533.83 - lr: 0.000137 - momentum: 0.000000 2023-10-11 09:42:59,160 epoch 2 - iter 243/272 - loss 0.53267587 - time (sec): 87.12 - samples/sec: 531.19 - lr: 0.000135 - momentum: 0.000000 2023-10-11 09:43:09,206 epoch 2 - iter 270/272 - loss 0.52022697 - time (sec): 97.17 - samples/sec: 533.36 - lr: 0.000134 - momentum: 0.000000 2023-10-11 09:43:09,606 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:43:09,606 EPOCH 2 done: loss 0.5193 - lr: 0.000134 2023-10-11 09:43:15,513 DEV : loss 0.3020303547382355 - f1-score (micro avg) 0.2903 2023-10-11 09:43:15,522 saving best model 2023-10-11 09:43:16,375 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:43:25,815 epoch 3 - iter 27/272 - loss 0.38410607 - time (sec): 9.44 - samples/sec: 557.12 - lr: 0.000132 - momentum: 0.000000 2023-10-11 09:43:35,064 epoch 3 - iter 54/272 - loss 0.37333646 - time (sec): 18.69 - samples/sec: 549.52 - lr: 0.000130 - momentum: 0.000000 2023-10-11 09:43:44,340 epoch 3 - iter 81/272 - loss 0.34926921 - time (sec): 27.96 - samples/sec: 543.47 - lr: 0.000128 - momentum: 0.000000 2023-10-11 09:43:53,690 epoch 3 - iter 108/272 - loss 0.34301949 - time (sec): 37.31 - samples/sec: 544.64 - lr: 0.000127 - momentum: 0.000000 2023-10-11 09:44:03,560 epoch 3 - iter 135/272 - loss 0.33694811 - time (sec): 47.18 - samples/sec: 551.32 - lr: 0.000125 - momentum: 0.000000 2023-10-11 09:44:12,886 epoch 3 - iter 162/272 - loss 0.32617109 - time (sec): 56.51 - samples/sec: 548.75 - lr: 0.000123 - momentum: 0.000000 2023-10-11 09:44:23,467 epoch 3 - iter 189/272 - loss 0.32458657 - time (sec): 67.09 - samples/sec: 555.56 - lr: 0.000122 - momentum: 0.000000 2023-10-11 09:44:33,403 epoch 3 - iter 216/272 - loss 0.31370703 - time (sec): 77.03 - samples/sec: 554.78 - lr: 0.000120 - momentum: 0.000000 2023-10-11 09:44:42,131 epoch 3 - iter 243/272 - loss 0.31287050 - time (sec): 85.75 - samples/sec: 546.57 - lr: 0.000119 - momentum: 0.000000 2023-10-11 09:44:51,444 epoch 3 - iter 270/272 - loss 0.31168254 - time (sec): 95.07 - samples/sec: 544.09 - lr: 0.000117 - momentum: 0.000000 2023-10-11 09:44:51,938 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:44:51,938 EPOCH 3 done: loss 0.3122 - lr: 0.000117 2023-10-11 09:44:57,957 DEV : loss 0.2577632665634155 - f1-score (micro avg) 0.3305 2023-10-11 09:44:57,965 saving best model 2023-10-11 09:45:00,499 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:45:09,969 epoch 4 - iter 27/272 - loss 0.29354878 - time (sec): 9.47 - samples/sec: 530.04 - lr: 0.000115 - momentum: 0.000000 2023-10-11 09:45:18,936 epoch 4 - iter 54/272 - loss 0.26285013 - time (sec): 18.43 - samples/sec: 518.92 - lr: 0.000113 - momentum: 0.000000 2023-10-11 09:45:28,972 epoch 4 - iter 81/272 - loss 0.24810936 - time (sec): 28.47 - samples/sec: 548.02 - lr: 0.000112 - momentum: 0.000000 2023-10-11 09:45:38,709 epoch 4 - iter 108/272 - loss 0.24358018 - time (sec): 38.21 - samples/sec: 552.25 - lr: 0.000110 - momentum: 0.000000 2023-10-11 09:45:47,860 epoch 4 - iter 135/272 - loss 0.23901215 - time (sec): 47.36 - samples/sec: 549.15 - lr: 0.000108 - momentum: 0.000000 2023-10-11 09:45:57,856 epoch 4 - iter 162/272 - loss 0.23297876 - time (sec): 57.35 - samples/sec: 551.33 - lr: 0.000107 - momentum: 0.000000 2023-10-11 09:46:06,942 epoch 4 - iter 189/272 - loss 0.23528726 - time (sec): 66.44 - samples/sec: 546.24 - lr: 0.000105 - momentum: 0.000000 2023-10-11 09:46:16,464 epoch 4 - iter 216/272 - loss 0.23243463 - time (sec): 75.96 - samples/sec: 545.52 - lr: 0.000103 - momentum: 0.000000 2023-10-11 09:46:26,378 epoch 4 - iter 243/272 - loss 0.23708393 - time (sec): 85.87 - samples/sec: 543.87 - lr: 0.000102 - momentum: 0.000000 2023-10-11 09:46:35,978 epoch 4 - iter 270/272 - loss 0.23306336 - time (sec): 95.47 - samples/sec: 542.45 - lr: 0.000100 - momentum: 0.000000 2023-10-11 09:46:36,427 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:46:36,428 EPOCH 4 done: loss 0.2329 - lr: 0.000100 2023-10-11 09:46:42,295 DEV : loss 0.19623495638370514 - f1-score (micro avg) 0.5471 2023-10-11 09:46:42,304 saving best model 2023-10-11 09:46:44,891 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:46:54,037 epoch 5 - iter 27/272 - loss 0.18848352 - time (sec): 9.14 - samples/sec: 510.71 - lr: 0.000098 - momentum: 0.000000 2023-10-11 09:47:03,653 epoch 5 - iter 54/272 - loss 0.17630539 - time (sec): 18.76 - samples/sec: 535.19 - lr: 0.000097 - momentum: 0.000000 2023-10-11 09:47:13,136 epoch 5 - iter 81/272 - loss 0.16835017 - time (sec): 28.24 - samples/sec: 543.11 - lr: 0.000095 - momentum: 0.000000 2023-10-11 09:47:22,285 epoch 5 - iter 108/272 - loss 0.16954062 - time (sec): 37.39 - samples/sec: 541.39 - lr: 0.000093 - momentum: 0.000000 2023-10-11 09:47:31,421 epoch 5 - iter 135/272 - loss 0.15990292 - time (sec): 46.53 - samples/sec: 537.48 - lr: 0.000092 - momentum: 0.000000 2023-10-11 09:47:41,467 epoch 5 - iter 162/272 - loss 0.15873693 - time (sec): 56.57 - samples/sec: 546.17 - lr: 0.000090 - momentum: 0.000000 2023-10-11 09:47:50,820 epoch 5 - iter 189/272 - loss 0.16239154 - time (sec): 65.93 - samples/sec: 544.54 - lr: 0.000088 - momentum: 0.000000 2023-10-11 09:48:00,587 epoch 5 - iter 216/272 - loss 0.16394956 - time (sec): 75.69 - samples/sec: 545.05 - lr: 0.000087 - momentum: 0.000000 2023-10-11 09:48:10,258 epoch 5 - iter 243/272 - loss 0.16439694 - time (sec): 85.36 - samples/sec: 547.01 - lr: 0.000085 - momentum: 0.000000 2023-10-11 09:48:19,506 epoch 5 - iter 270/272 - loss 0.16219348 - time (sec): 94.61 - samples/sec: 546.56 - lr: 0.000084 - momentum: 0.000000 2023-10-11 09:48:20,008 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:48:20,009 EPOCH 5 done: loss 0.1620 - lr: 0.000084 2023-10-11 09:48:25,523 DEV : loss 0.16873042285442352 - f1-score (micro avg) 0.6128 2023-10-11 09:48:25,531 saving best model 2023-10-11 09:48:28,074 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:48:37,600 epoch 6 - iter 27/272 - loss 0.14413262 - time (sec): 9.52 - samples/sec: 575.59 - lr: 0.000082 - momentum: 0.000000 2023-10-11 09:48:46,412 epoch 6 - iter 54/272 - loss 0.14208677 - time (sec): 18.33 - samples/sec: 556.61 - lr: 0.000080 - momentum: 0.000000 2023-10-11 09:48:55,754 epoch 6 - iter 81/272 - loss 0.13479681 - time (sec): 27.68 - samples/sec: 563.64 - lr: 0.000078 - momentum: 0.000000 2023-10-11 09:49:05,615 epoch 6 - iter 108/272 - loss 0.13231184 - time (sec): 37.54 - samples/sec: 572.69 - lr: 0.000077 - momentum: 0.000000 2023-10-11 09:49:14,856 epoch 6 - iter 135/272 - loss 0.12931757 - time (sec): 46.78 - samples/sec: 553.45 - lr: 0.000075 - momentum: 0.000000 2023-10-11 09:49:24,535 epoch 6 - iter 162/272 - loss 0.12321844 - time (sec): 56.46 - samples/sec: 558.64 - lr: 0.000073 - momentum: 0.000000 2023-10-11 09:49:33,780 epoch 6 - iter 189/272 - loss 0.11952690 - time (sec): 65.70 - samples/sec: 557.41 - lr: 0.000072 - momentum: 0.000000 2023-10-11 09:49:42,914 epoch 6 - iter 216/272 - loss 0.12316648 - time (sec): 74.84 - samples/sec: 556.15 - lr: 0.000070 - momentum: 0.000000 2023-10-11 09:49:52,434 epoch 6 - iter 243/272 - loss 0.12004095 - time (sec): 84.36 - samples/sec: 555.46 - lr: 0.000069 - momentum: 0.000000 2023-10-11 09:50:01,550 epoch 6 - iter 270/272 - loss 0.12026909 - time (sec): 93.47 - samples/sec: 553.33 - lr: 0.000067 - momentum: 0.000000 2023-10-11 09:50:02,085 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:50:02,085 EPOCH 6 done: loss 0.1200 - lr: 0.000067 2023-10-11 09:50:07,714 DEV : loss 0.1488361954689026 - f1-score (micro avg) 0.6112 2023-10-11 09:50:07,722 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:50:17,152 epoch 7 - iter 27/272 - loss 0.09847727 - time (sec): 9.43 - samples/sec: 511.97 - lr: 0.000065 - momentum: 0.000000 2023-10-11 09:50:27,127 epoch 7 - iter 54/272 - loss 0.09016802 - time (sec): 19.40 - samples/sec: 541.04 - lr: 0.000063 - momentum: 0.000000 2023-10-11 09:50:37,100 epoch 7 - iter 81/272 - loss 0.08601675 - time (sec): 29.38 - samples/sec: 544.14 - lr: 0.000062 - momentum: 0.000000 2023-10-11 09:50:46,118 epoch 7 - iter 108/272 - loss 0.08647169 - time (sec): 38.39 - samples/sec: 549.45 - lr: 0.000060 - momentum: 0.000000 2023-10-11 09:50:55,554 epoch 7 - iter 135/272 - loss 0.09090164 - time (sec): 47.83 - samples/sec: 554.50 - lr: 0.000058 - momentum: 0.000000 2023-10-11 09:51:04,902 epoch 7 - iter 162/272 - loss 0.08989021 - time (sec): 57.18 - samples/sec: 549.21 - lr: 0.000057 - momentum: 0.000000 2023-10-11 09:51:13,584 epoch 7 - iter 189/272 - loss 0.08994632 - time (sec): 65.86 - samples/sec: 543.30 - lr: 0.000055 - momentum: 0.000000 2023-10-11 09:51:23,911 epoch 7 - iter 216/272 - loss 0.08845138 - time (sec): 76.19 - samples/sec: 544.27 - lr: 0.000053 - momentum: 0.000000 2023-10-11 09:51:34,418 epoch 7 - iter 243/272 - loss 0.09342558 - time (sec): 86.69 - samples/sec: 534.59 - lr: 0.000052 - momentum: 0.000000 2023-10-11 09:51:45,463 epoch 7 - iter 270/272 - loss 0.09226835 - time (sec): 97.74 - samples/sec: 529.60 - lr: 0.000050 - momentum: 0.000000 2023-10-11 09:51:46,024 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:46,025 EPOCH 7 done: loss 0.0920 - lr: 0.000050 2023-10-11 09:51:51,630 DEV : loss 0.14688394963741302 - f1-score (micro avg) 0.6654 2023-10-11 09:51:51,638 saving best model 2023-10-11 09:51:54,157 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:52:03,922 epoch 8 - iter 27/272 - loss 0.08937485 - time (sec): 9.76 - samples/sec: 470.65 - lr: 0.000048 - momentum: 0.000000 2023-10-11 09:52:13,039 epoch 8 - iter 54/272 - loss 0.07136459 - time (sec): 18.88 - samples/sec: 497.63 - lr: 0.000047 - momentum: 0.000000 2023-10-11 09:52:23,152 epoch 8 - iter 81/272 - loss 0.07727213 - time (sec): 28.99 - samples/sec: 530.55 - lr: 0.000045 - momentum: 0.000000 2023-10-11 09:52:32,583 epoch 8 - iter 108/272 - loss 0.07738875 - time (sec): 38.42 - samples/sec: 538.24 - lr: 0.000043 - momentum: 0.000000 2023-10-11 09:52:41,991 epoch 8 - iter 135/272 - loss 0.07911098 - time (sec): 47.83 - samples/sec: 541.09 - lr: 0.000042 - momentum: 0.000000 2023-10-11 09:52:50,431 epoch 8 - iter 162/272 - loss 0.08340609 - time (sec): 56.27 - samples/sec: 533.28 - lr: 0.000040 - momentum: 0.000000 2023-10-11 09:53:00,002 epoch 8 - iter 189/272 - loss 0.08129659 - time (sec): 65.84 - samples/sec: 541.02 - lr: 0.000038 - momentum: 0.000000 2023-10-11 09:53:10,477 epoch 8 - iter 216/272 - loss 0.07877295 - time (sec): 76.32 - samples/sec: 551.20 - lr: 0.000037 - momentum: 0.000000 2023-10-11 09:53:19,525 epoch 8 - iter 243/272 - loss 0.07667042 - time (sec): 85.36 - samples/sec: 548.01 - lr: 0.000035 - momentum: 0.000000 2023-10-11 09:53:29,019 epoch 8 - iter 270/272 - loss 0.07426787 - time (sec): 94.86 - samples/sec: 546.00 - lr: 0.000034 - momentum: 0.000000 2023-10-11 09:53:29,435 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:53:29,435 EPOCH 8 done: loss 0.0742 - lr: 0.000034 2023-10-11 09:53:34,876 DEV : loss 0.14787980914115906 - f1-score (micro avg) 0.7432 2023-10-11 09:53:34,884 saving best model 2023-10-11 09:53:37,399 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:53:46,140 epoch 9 - iter 27/272 - loss 0.09102105 - time (sec): 8.74 - samples/sec: 491.25 - lr: 0.000032 - momentum: 0.000000 2023-10-11 09:53:55,379 epoch 9 - iter 54/272 - loss 0.08935916 - time (sec): 17.98 - samples/sec: 526.14 - lr: 0.000030 - momentum: 0.000000 2023-10-11 09:54:04,769 epoch 9 - iter 81/272 - loss 0.07481512 - time (sec): 27.37 - samples/sec: 529.16 - lr: 0.000028 - momentum: 0.000000 2023-10-11 09:54:14,186 epoch 9 - iter 108/272 - loss 0.07867968 - time (sec): 36.78 - samples/sec: 531.27 - lr: 0.000027 - momentum: 0.000000 2023-10-11 09:54:22,976 epoch 9 - iter 135/272 - loss 0.07834757 - time (sec): 45.57 - samples/sec: 525.22 - lr: 0.000025 - momentum: 0.000000 2023-10-11 09:54:33,025 epoch 9 - iter 162/272 - loss 0.07369543 - time (sec): 55.62 - samples/sec: 537.67 - lr: 0.000023 - momentum: 0.000000 2023-10-11 09:54:42,330 epoch 9 - iter 189/272 - loss 0.07182289 - time (sec): 64.93 - samples/sec: 539.05 - lr: 0.000022 - momentum: 0.000000 2023-10-11 09:54:52,050 epoch 9 - iter 216/272 - loss 0.06958838 - time (sec): 74.65 - samples/sec: 540.33 - lr: 0.000020 - momentum: 0.000000 2023-10-11 09:55:02,157 epoch 9 - iter 243/272 - loss 0.06633126 - time (sec): 84.75 - samples/sec: 545.83 - lr: 0.000019 - momentum: 0.000000 2023-10-11 09:55:11,700 epoch 9 - iter 270/272 - loss 0.06401853 - time (sec): 94.30 - samples/sec: 548.89 - lr: 0.000017 - momentum: 0.000000 2023-10-11 09:55:12,130 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:55:12,130 EPOCH 9 done: loss 0.0641 - lr: 0.000017 2023-10-11 09:55:17,955 DEV : loss 0.14741504192352295 - f1-score (micro avg) 0.7505 2023-10-11 09:55:17,963 saving best model 2023-10-11 09:55:20,524 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:55:29,829 epoch 10 - iter 27/272 - loss 0.07188521 - time (sec): 9.30 - samples/sec: 575.75 - lr: 0.000015 - momentum: 0.000000 2023-10-11 09:55:38,625 epoch 10 - iter 54/272 - loss 0.07716434 - time (sec): 18.10 - samples/sec: 554.04 - lr: 0.000013 - momentum: 0.000000 2023-10-11 09:55:47,519 epoch 10 - iter 81/272 - loss 0.06791166 - time (sec): 26.99 - samples/sec: 553.49 - lr: 0.000012 - momentum: 0.000000 2023-10-11 09:55:56,393 epoch 10 - iter 108/272 - loss 0.06458565 - time (sec): 35.86 - samples/sec: 549.24 - lr: 0.000010 - momentum: 0.000000 2023-10-11 09:56:06,420 epoch 10 - iter 135/272 - loss 0.06332167 - time (sec): 45.89 - samples/sec: 565.03 - lr: 0.000008 - momentum: 0.000000 2023-10-11 09:56:15,810 epoch 10 - iter 162/272 - loss 0.05986959 - time (sec): 55.28 - samples/sec: 557.44 - lr: 0.000007 - momentum: 0.000000 2023-10-11 09:56:25,615 epoch 10 - iter 189/272 - loss 0.05919230 - time (sec): 65.09 - samples/sec: 553.40 - lr: 0.000005 - momentum: 0.000000 2023-10-11 09:56:35,088 epoch 10 - iter 216/272 - loss 0.05792051 - time (sec): 74.56 - samples/sec: 555.55 - lr: 0.000003 - momentum: 0.000000 2023-10-11 09:56:45,050 epoch 10 - iter 243/272 - loss 0.05609555 - time (sec): 84.52 - samples/sec: 555.36 - lr: 0.000002 - momentum: 0.000000 2023-10-11 09:56:54,243 epoch 10 - iter 270/272 - loss 0.05776757 - time (sec): 93.71 - samples/sec: 551.58 - lr: 0.000000 - momentum: 0.000000 2023-10-11 09:56:54,775 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:56:54,775 EPOCH 10 done: loss 0.0576 - lr: 0.000000 2023-10-11 09:57:00,361 DEV : loss 0.14719465374946594 - f1-score (micro avg) 0.7401 2023-10-11 09:57:01,192 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:57:01,194 Loading model from best epoch ... 2023-10-11 09:57:05,952 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-11 09:57:18,000 Results: - F-score (micro) 0.7087 - F-score (macro) 0.6243 - Accuracy 0.5811 By class: precision recall f1-score support LOC 0.6899 0.8558 0.7639 312 PER 0.6667 0.7212 0.6928 208 ORG 0.5263 0.3636 0.4301 55 HumanProd 0.4865 0.8182 0.6102 22 micro avg 0.6623 0.7621 0.7087 597 macro avg 0.5923 0.6897 0.6243 597 weighted avg 0.6593 0.7621 0.7028 597 2023-10-11 09:57:18,000 ----------------------------------------------------------------------------------------------------