2023-10-12 18:58:20,800 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,802 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 18:58:20,803 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,803 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 18:58:20,803 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,803 Train: 5777 sentences 2023-10-12 18:58:20,803 (train_with_dev=False, train_with_test=False) 2023-10-12 18:58:20,803 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,803 Training Params: 2023-10-12 18:58:20,803 - learning_rate: "0.00016" 2023-10-12 18:58:20,803 - mini_batch_size: "8" 2023-10-12 18:58:20,803 - max_epochs: "10" 2023-10-12 18:58:20,804 - shuffle: "True" 2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,804 Plugins: 2023-10-12 18:58:20,804 - TensorboardLogger 2023-10-12 18:58:20,804 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,804 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 18:58:20,804 - metric: "('micro avg', 'f1-score')" 2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,804 Computation: 2023-10-12 18:58:20,804 - compute on device: cuda:0 2023-10-12 18:58:20,804 - embedding storage: none 2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,804 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" 2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:58:20,805 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 18:59:00,054 epoch 1 - iter 72/723 - loss 2.54355580 - time (sec): 39.25 - samples/sec: 460.13 - lr: 0.000016 - momentum: 0.000000 2023-10-12 18:59:38,638 epoch 1 - iter 144/723 - loss 2.47351916 - time (sec): 77.83 - samples/sec: 463.81 - lr: 0.000032 - momentum: 0.000000 2023-10-12 19:00:17,358 epoch 1 - iter 216/723 - loss 2.30912831 - time (sec): 116.55 - samples/sec: 449.38 - lr: 0.000048 - momentum: 0.000000 2023-10-12 19:00:59,860 epoch 1 - iter 288/723 - loss 2.09160479 - time (sec): 159.05 - samples/sec: 440.50 - lr: 0.000064 - momentum: 0.000000 2023-10-12 19:01:41,319 epoch 1 - iter 360/723 - loss 1.86564000 - time (sec): 200.51 - samples/sec: 438.01 - lr: 0.000079 - momentum: 0.000000 2023-10-12 19:02:21,012 epoch 1 - iter 432/723 - loss 1.64989151 - time (sec): 240.21 - samples/sec: 435.87 - lr: 0.000095 - momentum: 0.000000 2023-10-12 19:03:00,546 epoch 1 - iter 504/723 - loss 1.45108438 - time (sec): 279.74 - samples/sec: 437.65 - lr: 0.000111 - momentum: 0.000000 2023-10-12 19:03:39,300 epoch 1 - iter 576/723 - loss 1.30837807 - time (sec): 318.49 - samples/sec: 437.09 - lr: 0.000127 - momentum: 0.000000 2023-10-12 19:04:19,048 epoch 1 - iter 648/723 - loss 1.18554857 - time (sec): 358.24 - samples/sec: 438.05 - lr: 0.000143 - momentum: 0.000000 2023-10-12 19:05:00,227 epoch 1 - iter 720/723 - loss 1.07844162 - time (sec): 399.42 - samples/sec: 439.29 - lr: 0.000159 - momentum: 0.000000 2023-10-12 19:05:01,624 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:05:01,624 EPOCH 1 done: loss 1.0744 - lr: 0.000159 2023-10-12 19:05:21,904 DEV : loss 0.2226296067237854 - f1-score (micro avg) 0.0 2023-10-12 19:05:21,937 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:06:00,883 epoch 2 - iter 72/723 - loss 0.16071835 - time (sec): 38.94 - samples/sec: 447.92 - lr: 0.000158 - momentum: 0.000000 2023-10-12 19:06:40,410 epoch 2 - iter 144/723 - loss 0.14928826 - time (sec): 78.47 - samples/sec: 451.95 - lr: 0.000156 - momentum: 0.000000 2023-10-12 19:07:19,001 epoch 2 - iter 216/723 - loss 0.14640929 - time (sec): 117.06 - samples/sec: 444.07 - lr: 0.000155 - momentum: 0.000000 2023-10-12 19:07:58,952 epoch 2 - iter 288/723 - loss 0.14351447 - time (sec): 157.01 - samples/sec: 444.43 - lr: 0.000153 - momentum: 0.000000 2023-10-12 19:08:37,927 epoch 2 - iter 360/723 - loss 0.13927936 - time (sec): 195.99 - samples/sec: 445.13 - lr: 0.000151 - momentum: 0.000000 2023-10-12 19:09:17,408 epoch 2 - iter 432/723 - loss 0.13562770 - time (sec): 235.47 - samples/sec: 445.08 - lr: 0.000149 - momentum: 0.000000 2023-10-12 19:09:56,251 epoch 2 - iter 504/723 - loss 0.13508475 - time (sec): 274.31 - samples/sec: 445.09 - lr: 0.000148 - momentum: 0.000000 2023-10-12 19:10:35,074 epoch 2 - iter 576/723 - loss 0.13232611 - time (sec): 313.13 - samples/sec: 446.82 - lr: 0.000146 - momentum: 0.000000 2023-10-12 19:11:14,451 epoch 2 - iter 648/723 - loss 0.12941887 - time (sec): 352.51 - samples/sec: 448.75 - lr: 0.000144 - momentum: 0.000000 2023-10-12 19:11:53,004 epoch 2 - iter 720/723 - loss 0.12559223 - time (sec): 391.07 - samples/sec: 449.04 - lr: 0.000142 - momentum: 0.000000 2023-10-12 19:11:54,252 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:11:54,252 EPOCH 2 done: loss 0.1254 - lr: 0.000142 2023-10-12 19:12:15,858 DEV : loss 0.10742620378732681 - f1-score (micro avg) 0.7805 2023-10-12 19:12:15,891 saving best model 2023-10-12 19:12:16,807 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:12:56,695 epoch 3 - iter 72/723 - loss 0.08288091 - time (sec): 39.89 - samples/sec: 448.88 - lr: 0.000140 - momentum: 0.000000 2023-10-12 19:13:36,974 epoch 3 - iter 144/723 - loss 0.07985183 - time (sec): 80.16 - samples/sec: 448.64 - lr: 0.000139 - momentum: 0.000000 2023-10-12 19:14:16,040 epoch 3 - iter 216/723 - loss 0.07952059 - time (sec): 119.23 - samples/sec: 448.91 - lr: 0.000137 - momentum: 0.000000 2023-10-12 19:14:54,814 epoch 3 - iter 288/723 - loss 0.07742485 - time (sec): 158.01 - samples/sec: 449.88 - lr: 0.000135 - momentum: 0.000000 2023-10-12 19:15:33,909 epoch 3 - iter 360/723 - loss 0.07655356 - time (sec): 197.10 - samples/sec: 451.75 - lr: 0.000133 - momentum: 0.000000 2023-10-12 19:16:13,793 epoch 3 - iter 432/723 - loss 0.07627322 - time (sec): 236.98 - samples/sec: 454.76 - lr: 0.000132 - momentum: 0.000000 2023-10-12 19:16:52,959 epoch 3 - iter 504/723 - loss 0.07598384 - time (sec): 276.15 - samples/sec: 453.26 - lr: 0.000130 - momentum: 0.000000 2023-10-12 19:17:32,309 epoch 3 - iter 576/723 - loss 0.07582371 - time (sec): 315.50 - samples/sec: 450.64 - lr: 0.000128 - momentum: 0.000000 2023-10-12 19:18:11,512 epoch 3 - iter 648/723 - loss 0.07469627 - time (sec): 354.70 - samples/sec: 447.48 - lr: 0.000126 - momentum: 0.000000 2023-10-12 19:18:50,760 epoch 3 - iter 720/723 - loss 0.07352052 - time (sec): 393.95 - samples/sec: 445.53 - lr: 0.000125 - momentum: 0.000000 2023-10-12 19:18:52,075 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:18:52,075 EPOCH 3 done: loss 0.0736 - lr: 0.000125 2023-10-12 19:19:13,745 DEV : loss 0.07580851018428802 - f1-score (micro avg) 0.8611 2023-10-12 19:19:13,776 saving best model 2023-10-12 19:19:24,857 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:20:05,578 epoch 4 - iter 72/723 - loss 0.05191155 - time (sec): 40.72 - samples/sec: 440.48 - lr: 0.000123 - momentum: 0.000000 2023-10-12 19:20:44,504 epoch 4 - iter 144/723 - loss 0.05203988 - time (sec): 79.64 - samples/sec: 436.31 - lr: 0.000121 - momentum: 0.000000 2023-10-12 19:21:22,936 epoch 4 - iter 216/723 - loss 0.04893082 - time (sec): 118.07 - samples/sec: 444.03 - lr: 0.000119 - momentum: 0.000000 2023-10-12 19:22:01,345 epoch 4 - iter 288/723 - loss 0.04936358 - time (sec): 156.48 - samples/sec: 456.68 - lr: 0.000117 - momentum: 0.000000 2023-10-12 19:22:39,287 epoch 4 - iter 360/723 - loss 0.04704445 - time (sec): 194.43 - samples/sec: 458.90 - lr: 0.000116 - momentum: 0.000000 2023-10-12 19:23:17,810 epoch 4 - iter 432/723 - loss 0.04632235 - time (sec): 232.95 - samples/sec: 455.86 - lr: 0.000114 - momentum: 0.000000 2023-10-12 19:23:57,203 epoch 4 - iter 504/723 - loss 0.04588963 - time (sec): 272.34 - samples/sec: 452.32 - lr: 0.000112 - momentum: 0.000000 2023-10-12 19:24:36,716 epoch 4 - iter 576/723 - loss 0.04525655 - time (sec): 311.85 - samples/sec: 453.47 - lr: 0.000110 - momentum: 0.000000 2023-10-12 19:25:18,379 epoch 4 - iter 648/723 - loss 0.04859378 - time (sec): 353.52 - samples/sec: 449.36 - lr: 0.000109 - momentum: 0.000000 2023-10-12 19:25:56,777 epoch 4 - iter 720/723 - loss 0.04735869 - time (sec): 391.92 - samples/sec: 448.64 - lr: 0.000107 - momentum: 0.000000 2023-10-12 19:25:57,841 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:25:57,841 EPOCH 4 done: loss 0.0475 - lr: 0.000107 2023-10-12 19:26:19,114 DEV : loss 0.09613429009914398 - f1-score (micro avg) 0.8346 2023-10-12 19:26:19,147 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:26:59,987 epoch 5 - iter 72/723 - loss 0.03592579 - time (sec): 40.84 - samples/sec: 462.23 - lr: 0.000105 - momentum: 0.000000 2023-10-12 19:27:38,528 epoch 5 - iter 144/723 - loss 0.03119838 - time (sec): 79.38 - samples/sec: 457.18 - lr: 0.000103 - momentum: 0.000000 2023-10-12 19:28:16,016 epoch 5 - iter 216/723 - loss 0.03042988 - time (sec): 116.87 - samples/sec: 443.70 - lr: 0.000101 - momentum: 0.000000 2023-10-12 19:28:54,004 epoch 5 - iter 288/723 - loss 0.03005227 - time (sec): 154.85 - samples/sec: 439.46 - lr: 0.000100 - momentum: 0.000000 2023-10-12 19:29:34,928 epoch 5 - iter 360/723 - loss 0.03217606 - time (sec): 195.78 - samples/sec: 442.46 - lr: 0.000098 - momentum: 0.000000 2023-10-12 19:30:14,684 epoch 5 - iter 432/723 - loss 0.03112412 - time (sec): 235.53 - samples/sec: 441.59 - lr: 0.000096 - momentum: 0.000000 2023-10-12 19:30:55,240 epoch 5 - iter 504/723 - loss 0.03176554 - time (sec): 276.09 - samples/sec: 443.79 - lr: 0.000094 - momentum: 0.000000 2023-10-12 19:31:34,003 epoch 5 - iter 576/723 - loss 0.03180365 - time (sec): 314.85 - samples/sec: 445.61 - lr: 0.000093 - momentum: 0.000000 2023-10-12 19:32:13,719 epoch 5 - iter 648/723 - loss 0.03213951 - time (sec): 354.57 - samples/sec: 445.33 - lr: 0.000091 - momentum: 0.000000 2023-10-12 19:32:54,951 epoch 5 - iter 720/723 - loss 0.03253080 - time (sec): 395.80 - samples/sec: 443.07 - lr: 0.000089 - momentum: 0.000000 2023-10-12 19:32:56,682 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:32:56,683 EPOCH 5 done: loss 0.0326 - lr: 0.000089 2023-10-12 19:33:18,519 DEV : loss 0.08075438439846039 - f1-score (micro avg) 0.8604 2023-10-12 19:33:18,549 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:33:57,261 epoch 6 - iter 72/723 - loss 0.02372279 - time (sec): 38.71 - samples/sec: 444.51 - lr: 0.000087 - momentum: 0.000000 2023-10-12 19:34:36,141 epoch 6 - iter 144/723 - loss 0.02332990 - time (sec): 77.59 - samples/sec: 445.00 - lr: 0.000085 - momentum: 0.000000 2023-10-12 19:35:15,551 epoch 6 - iter 216/723 - loss 0.02529176 - time (sec): 117.00 - samples/sec: 447.90 - lr: 0.000084 - momentum: 0.000000 2023-10-12 19:35:56,603 epoch 6 - iter 288/723 - loss 0.02502072 - time (sec): 158.05 - samples/sec: 445.49 - lr: 0.000082 - momentum: 0.000000 2023-10-12 19:36:37,040 epoch 6 - iter 360/723 - loss 0.02478190 - time (sec): 198.49 - samples/sec: 444.01 - lr: 0.000080 - momentum: 0.000000 2023-10-12 19:37:16,368 epoch 6 - iter 432/723 - loss 0.02272948 - time (sec): 237.82 - samples/sec: 448.09 - lr: 0.000078 - momentum: 0.000000 2023-10-12 19:37:55,143 epoch 6 - iter 504/723 - loss 0.02482061 - time (sec): 276.59 - samples/sec: 449.75 - lr: 0.000077 - momentum: 0.000000 2023-10-12 19:38:32,812 epoch 6 - iter 576/723 - loss 0.02360557 - time (sec): 314.26 - samples/sec: 448.53 - lr: 0.000075 - momentum: 0.000000 2023-10-12 19:39:11,274 epoch 6 - iter 648/723 - loss 0.02300141 - time (sec): 352.72 - samples/sec: 446.97 - lr: 0.000073 - momentum: 0.000000 2023-10-12 19:39:53,495 epoch 6 - iter 720/723 - loss 0.02395673 - time (sec): 394.94 - samples/sec: 444.81 - lr: 0.000071 - momentum: 0.000000 2023-10-12 19:39:54,699 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:39:54,700 EPOCH 6 done: loss 0.0240 - lr: 0.000071 2023-10-12 19:40:16,957 DEV : loss 0.09915146231651306 - f1-score (micro avg) 0.8614 2023-10-12 19:40:16,992 saving best model 2023-10-12 19:40:19,580 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:41:00,926 epoch 7 - iter 72/723 - loss 0.02262591 - time (sec): 41.34 - samples/sec: 426.67 - lr: 0.000069 - momentum: 0.000000 2023-10-12 19:41:42,902 epoch 7 - iter 144/723 - loss 0.02190572 - time (sec): 83.32 - samples/sec: 426.99 - lr: 0.000068 - momentum: 0.000000 2023-10-12 19:42:24,325 epoch 7 - iter 216/723 - loss 0.02121091 - time (sec): 124.74 - samples/sec: 417.62 - lr: 0.000066 - momentum: 0.000000 2023-10-12 19:43:04,878 epoch 7 - iter 288/723 - loss 0.02062404 - time (sec): 165.29 - samples/sec: 415.31 - lr: 0.000064 - momentum: 0.000000 2023-10-12 19:43:46,681 epoch 7 - iter 360/723 - loss 0.02239849 - time (sec): 207.10 - samples/sec: 419.10 - lr: 0.000062 - momentum: 0.000000 2023-10-12 19:44:28,351 epoch 7 - iter 432/723 - loss 0.02093754 - time (sec): 248.77 - samples/sec: 418.93 - lr: 0.000061 - momentum: 0.000000 2023-10-12 19:45:08,141 epoch 7 - iter 504/723 - loss 0.02078108 - time (sec): 288.56 - samples/sec: 422.55 - lr: 0.000059 - momentum: 0.000000 2023-10-12 19:45:47,271 epoch 7 - iter 576/723 - loss 0.02052113 - time (sec): 327.69 - samples/sec: 424.00 - lr: 0.000057 - momentum: 0.000000 2023-10-12 19:46:26,176 epoch 7 - iter 648/723 - loss 0.01984905 - time (sec): 366.59 - samples/sec: 426.95 - lr: 0.000055 - momentum: 0.000000 2023-10-12 19:47:06,241 epoch 7 - iter 720/723 - loss 0.01954505 - time (sec): 406.66 - samples/sec: 431.54 - lr: 0.000053 - momentum: 0.000000 2023-10-12 19:47:07,629 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:47:07,630 EPOCH 7 done: loss 0.0195 - lr: 0.000053 2023-10-12 19:47:29,089 DEV : loss 0.10769647359848022 - f1-score (micro avg) 0.8602 2023-10-12 19:47:29,124 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:48:08,516 epoch 8 - iter 72/723 - loss 0.01456090 - time (sec): 39.39 - samples/sec: 471.19 - lr: 0.000052 - momentum: 0.000000 2023-10-12 19:48:47,249 epoch 8 - iter 144/723 - loss 0.01638180 - time (sec): 78.12 - samples/sec: 459.65 - lr: 0.000050 - momentum: 0.000000 2023-10-12 19:49:25,637 epoch 8 - iter 216/723 - loss 0.01560092 - time (sec): 116.51 - samples/sec: 453.37 - lr: 0.000048 - momentum: 0.000000 2023-10-12 19:50:05,829 epoch 8 - iter 288/723 - loss 0.01509465 - time (sec): 156.70 - samples/sec: 459.02 - lr: 0.000046 - momentum: 0.000000 2023-10-12 19:50:44,804 epoch 8 - iter 360/723 - loss 0.01478213 - time (sec): 195.68 - samples/sec: 456.37 - lr: 0.000045 - momentum: 0.000000 2023-10-12 19:51:23,381 epoch 8 - iter 432/723 - loss 0.01528980 - time (sec): 234.25 - samples/sec: 451.48 - lr: 0.000043 - momentum: 0.000000 2023-10-12 19:52:02,332 epoch 8 - iter 504/723 - loss 0.01593641 - time (sec): 273.21 - samples/sec: 450.38 - lr: 0.000041 - momentum: 0.000000 2023-10-12 19:52:41,277 epoch 8 - iter 576/723 - loss 0.01546593 - time (sec): 312.15 - samples/sec: 446.97 - lr: 0.000039 - momentum: 0.000000 2023-10-12 19:53:22,174 epoch 8 - iter 648/723 - loss 0.01707868 - time (sec): 353.05 - samples/sec: 447.25 - lr: 0.000037 - momentum: 0.000000 2023-10-12 19:54:02,545 epoch 8 - iter 720/723 - loss 0.01621935 - time (sec): 393.42 - samples/sec: 446.66 - lr: 0.000036 - momentum: 0.000000 2023-10-12 19:54:03,704 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:54:03,705 EPOCH 8 done: loss 0.0162 - lr: 0.000036 2023-10-12 19:54:24,898 DEV : loss 0.11788733303546906 - f1-score (micro avg) 0.8613 2023-10-12 19:54:24,929 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:55:04,405 epoch 9 - iter 72/723 - loss 0.00435665 - time (sec): 39.47 - samples/sec: 466.21 - lr: 0.000034 - momentum: 0.000000 2023-10-12 19:55:43,405 epoch 9 - iter 144/723 - loss 0.01561935 - time (sec): 78.47 - samples/sec: 474.03 - lr: 0.000032 - momentum: 0.000000 2023-10-12 19:56:21,332 epoch 9 - iter 216/723 - loss 0.01515669 - time (sec): 116.40 - samples/sec: 472.95 - lr: 0.000030 - momentum: 0.000000 2023-10-12 19:56:58,414 epoch 9 - iter 288/723 - loss 0.01423773 - time (sec): 153.48 - samples/sec: 463.50 - lr: 0.000028 - momentum: 0.000000 2023-10-12 19:57:36,161 epoch 9 - iter 360/723 - loss 0.01346231 - time (sec): 191.23 - samples/sec: 454.95 - lr: 0.000027 - momentum: 0.000000 2023-10-12 19:58:16,429 epoch 9 - iter 432/723 - loss 0.01303858 - time (sec): 231.50 - samples/sec: 453.21 - lr: 0.000025 - momentum: 0.000000 2023-10-12 19:58:56,064 epoch 9 - iter 504/723 - loss 0.01320394 - time (sec): 271.13 - samples/sec: 451.94 - lr: 0.000023 - momentum: 0.000000 2023-10-12 19:59:36,961 epoch 9 - iter 576/723 - loss 0.01367903 - time (sec): 312.03 - samples/sec: 453.35 - lr: 0.000021 - momentum: 0.000000 2023-10-12 20:00:16,437 epoch 9 - iter 648/723 - loss 0.01284167 - time (sec): 351.51 - samples/sec: 450.96 - lr: 0.000020 - momentum: 0.000000 2023-10-12 20:00:56,177 epoch 9 - iter 720/723 - loss 0.01273017 - time (sec): 391.25 - samples/sec: 449.00 - lr: 0.000018 - momentum: 0.000000 2023-10-12 20:00:57,336 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:00:57,336 EPOCH 9 done: loss 0.0127 - lr: 0.000018 2023-10-12 20:01:18,773 DEV : loss 0.11393096297979355 - f1-score (micro avg) 0.8665 2023-10-12 20:01:18,808 saving best model 2023-10-12 20:01:23,910 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:02:03,031 epoch 10 - iter 72/723 - loss 0.00615811 - time (sec): 39.12 - samples/sec: 460.52 - lr: 0.000016 - momentum: 0.000000 2023-10-12 20:02:41,238 epoch 10 - iter 144/723 - loss 0.00676133 - time (sec): 77.32 - samples/sec: 436.23 - lr: 0.000014 - momentum: 0.000000 2023-10-12 20:03:20,138 epoch 10 - iter 216/723 - loss 0.00881176 - time (sec): 116.22 - samples/sec: 436.14 - lr: 0.000012 - momentum: 0.000000 2023-10-12 20:04:00,711 epoch 10 - iter 288/723 - loss 0.01105617 - time (sec): 156.80 - samples/sec: 440.79 - lr: 0.000011 - momentum: 0.000000 2023-10-12 20:04:40,165 epoch 10 - iter 360/723 - loss 0.01009560 - time (sec): 196.25 - samples/sec: 439.76 - lr: 0.000009 - momentum: 0.000000 2023-10-12 20:05:21,218 epoch 10 - iter 432/723 - loss 0.00922833 - time (sec): 237.30 - samples/sec: 441.24 - lr: 0.000007 - momentum: 0.000000 2023-10-12 20:06:01,972 epoch 10 - iter 504/723 - loss 0.00991438 - time (sec): 278.06 - samples/sec: 443.34 - lr: 0.000005 - momentum: 0.000000 2023-10-12 20:06:40,462 epoch 10 - iter 576/723 - loss 0.00967514 - time (sec): 316.55 - samples/sec: 442.17 - lr: 0.000004 - momentum: 0.000000 2023-10-12 20:07:19,416 epoch 10 - iter 648/723 - loss 0.00999608 - time (sec): 355.50 - samples/sec: 443.56 - lr: 0.000002 - momentum: 0.000000 2023-10-12 20:07:58,652 epoch 10 - iter 720/723 - loss 0.00993301 - time (sec): 394.74 - samples/sec: 445.20 - lr: 0.000000 - momentum: 0.000000 2023-10-12 20:07:59,756 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:07:59,756 EPOCH 10 done: loss 0.0099 - lr: 0.000000 2023-10-12 20:08:21,111 DEV : loss 0.11986048519611359 - f1-score (micro avg) 0.8657 2023-10-12 20:08:21,984 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:08:21,986 Loading model from best epoch ... 2023-10-12 20:08:26,181 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 20:08:47,461 Results: - F-score (micro) 0.8641 - F-score (macro) 0.7722 - Accuracy 0.7736 By class: precision recall f1-score support PER 0.8645 0.8734 0.8689 482 LOC 0.9154 0.8974 0.9063 458 ORG 0.5625 0.5217 0.5414 69 micro avg 0.8680 0.8603 0.8641 1009 macro avg 0.7808 0.7642 0.7722 1009 weighted avg 0.8669 0.8603 0.8635 1009 2023-10-12 20:08:47,461 ----------------------------------------------------------------------------------------------------