2023-10-06 21:22:02,563 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,565 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 21:22:02,565 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,565 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-06 21:22:02,565 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,565 Train: 1100 sentences 2023-10-06 21:22:02,565 (train_with_dev=False, train_with_test=False) 2023-10-06 21:22:02,565 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,565 Training Params: 2023-10-06 21:22:02,565 - learning_rate: "0.00015" 2023-10-06 21:22:02,565 - mini_batch_size: "8" 2023-10-06 21:22:02,565 - max_epochs: "10" 2023-10-06 21:22:02,566 - shuffle: "True" 2023-10-06 21:22:02,566 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,566 Plugins: 2023-10-06 21:22:02,566 - TensorboardLogger 2023-10-06 21:22:02,566 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 21:22:02,566 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,566 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 21:22:02,566 - metric: "('micro avg', 'f1-score')" 2023-10-06 21:22:02,566 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,566 Computation: 2023-10-06 21:22:02,566 - compute on device: cuda:0 2023-10-06 21:22:02,566 - embedding storage: none 2023-10-06 21:22:02,566 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,566 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-06 21:22:02,566 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,566 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:22:02,567 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 21:22:11,886 epoch 1 - iter 13/138 - loss 3.22941656 - time (sec): 9.32 - samples/sec: 229.44 - lr: 0.000013 - momentum: 0.000000 2023-10-06 21:22:21,639 epoch 1 - iter 26/138 - loss 3.22543812 - time (sec): 19.07 - samples/sec: 223.89 - lr: 0.000027 - momentum: 0.000000 2023-10-06 21:22:30,718 epoch 1 - iter 39/138 - loss 3.21697465 - time (sec): 28.15 - samples/sec: 219.58 - lr: 0.000041 - momentum: 0.000000 2023-10-06 21:22:40,061 epoch 1 - iter 52/138 - loss 3.20385144 - time (sec): 37.49 - samples/sec: 218.26 - lr: 0.000055 - momentum: 0.000000 2023-10-06 21:22:51,021 epoch 1 - iter 65/138 - loss 3.17535981 - time (sec): 48.45 - samples/sec: 221.14 - lr: 0.000070 - momentum: 0.000000 2023-10-06 21:23:01,225 epoch 1 - iter 78/138 - loss 3.12396879 - time (sec): 58.66 - samples/sec: 222.72 - lr: 0.000084 - momentum: 0.000000 2023-10-06 21:23:11,073 epoch 1 - iter 91/138 - loss 3.06629202 - time (sec): 68.51 - samples/sec: 222.65 - lr: 0.000098 - momentum: 0.000000 2023-10-06 21:23:20,549 epoch 1 - iter 104/138 - loss 2.99450984 - time (sec): 77.98 - samples/sec: 223.02 - lr: 0.000112 - momentum: 0.000000 2023-10-06 21:23:30,214 epoch 1 - iter 117/138 - loss 2.91692525 - time (sec): 87.65 - samples/sec: 223.28 - lr: 0.000126 - momentum: 0.000000 2023-10-06 21:23:39,021 epoch 1 - iter 130/138 - loss 2.85180662 - time (sec): 96.45 - samples/sec: 223.37 - lr: 0.000140 - momentum: 0.000000 2023-10-06 21:23:44,676 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:23:44,676 EPOCH 1 done: loss 2.8056 - lr: 0.000140 2023-10-06 21:23:51,190 DEV : loss 1.8371256589889526 - f1-score (micro avg) 0.0 2023-10-06 21:23:51,197 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:24:00,566 epoch 2 - iter 13/138 - loss 1.77694518 - time (sec): 9.37 - samples/sec: 228.65 - lr: 0.000149 - momentum: 0.000000 2023-10-06 21:24:10,360 epoch 2 - iter 26/138 - loss 1.65308199 - time (sec): 19.16 - samples/sec: 226.33 - lr: 0.000147 - momentum: 0.000000 2023-10-06 21:24:19,565 epoch 2 - iter 39/138 - loss 1.59532874 - time (sec): 28.37 - samples/sec: 225.09 - lr: 0.000145 - momentum: 0.000000 2023-10-06 21:24:28,430 epoch 2 - iter 52/138 - loss 1.51112645 - time (sec): 37.23 - samples/sec: 221.02 - lr: 0.000144 - momentum: 0.000000 2023-10-06 21:24:37,827 epoch 2 - iter 65/138 - loss 1.43439996 - time (sec): 46.63 - samples/sec: 220.03 - lr: 0.000142 - momentum: 0.000000 2023-10-06 21:24:47,721 epoch 2 - iter 78/138 - loss 1.37037716 - time (sec): 56.52 - samples/sec: 219.91 - lr: 0.000141 - momentum: 0.000000 2023-10-06 21:24:57,774 epoch 2 - iter 91/138 - loss 1.30689503 - time (sec): 66.58 - samples/sec: 221.16 - lr: 0.000139 - momentum: 0.000000 2023-10-06 21:25:08,118 epoch 2 - iter 104/138 - loss 1.25259669 - time (sec): 76.92 - samples/sec: 221.58 - lr: 0.000138 - momentum: 0.000000 2023-10-06 21:25:17,579 epoch 2 - iter 117/138 - loss 1.21082980 - time (sec): 86.38 - samples/sec: 222.18 - lr: 0.000136 - momentum: 0.000000 2023-10-06 21:25:27,823 epoch 2 - iter 130/138 - loss 1.17062941 - time (sec): 96.62 - samples/sec: 223.57 - lr: 0.000134 - momentum: 0.000000 2023-10-06 21:25:33,077 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:25:33,077 EPOCH 2 done: loss 1.1571 - lr: 0.000134 2023-10-06 21:25:39,661 DEV : loss 0.8009519577026367 - f1-score (micro avg) 0.0 2023-10-06 21:25:39,666 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:25:49,073 epoch 3 - iter 13/138 - loss 0.75025804 - time (sec): 9.40 - samples/sec: 228.93 - lr: 0.000132 - momentum: 0.000000 2023-10-06 21:25:59,238 epoch 3 - iter 26/138 - loss 0.68311285 - time (sec): 19.57 - samples/sec: 230.29 - lr: 0.000130 - momentum: 0.000000 2023-10-06 21:26:08,925 epoch 3 - iter 39/138 - loss 0.67230598 - time (sec): 29.26 - samples/sec: 229.79 - lr: 0.000129 - momentum: 0.000000 2023-10-06 21:26:19,221 epoch 3 - iter 52/138 - loss 0.65242616 - time (sec): 39.55 - samples/sec: 230.10 - lr: 0.000127 - momentum: 0.000000 2023-10-06 21:26:28,165 epoch 3 - iter 65/138 - loss 0.64808759 - time (sec): 48.50 - samples/sec: 228.82 - lr: 0.000126 - momentum: 0.000000 2023-10-06 21:26:38,589 epoch 3 - iter 78/138 - loss 0.63611414 - time (sec): 58.92 - samples/sec: 228.59 - lr: 0.000124 - momentum: 0.000000 2023-10-06 21:26:47,497 epoch 3 - iter 91/138 - loss 0.61869499 - time (sec): 67.83 - samples/sec: 226.36 - lr: 0.000123 - momentum: 0.000000 2023-10-06 21:26:56,702 epoch 3 - iter 104/138 - loss 0.59904620 - time (sec): 77.03 - samples/sec: 224.32 - lr: 0.000121 - momentum: 0.000000 2023-10-06 21:27:06,232 epoch 3 - iter 117/138 - loss 0.58294330 - time (sec): 86.56 - samples/sec: 223.80 - lr: 0.000119 - momentum: 0.000000 2023-10-06 21:27:16,046 epoch 3 - iter 130/138 - loss 0.56059088 - time (sec): 96.38 - samples/sec: 223.76 - lr: 0.000118 - momentum: 0.000000 2023-10-06 21:27:21,628 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:27:21,628 EPOCH 3 done: loss 0.5520 - lr: 0.000118 2023-10-06 21:27:28,179 DEV : loss 0.4173962473869324 - f1-score (micro avg) 0.4178 2023-10-06 21:27:28,185 saving best model 2023-10-06 21:27:29,077 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:27:39,635 epoch 4 - iter 13/138 - loss 0.36064320 - time (sec): 10.56 - samples/sec: 235.41 - lr: 0.000115 - momentum: 0.000000 2023-10-06 21:27:49,334 epoch 4 - iter 26/138 - loss 0.34840726 - time (sec): 20.26 - samples/sec: 228.93 - lr: 0.000114 - momentum: 0.000000 2023-10-06 21:27:58,232 epoch 4 - iter 39/138 - loss 0.35511690 - time (sec): 29.15 - samples/sec: 223.20 - lr: 0.000112 - momentum: 0.000000 2023-10-06 21:28:07,588 epoch 4 - iter 52/138 - loss 0.36227900 - time (sec): 38.51 - samples/sec: 224.86 - lr: 0.000111 - momentum: 0.000000 2023-10-06 21:28:17,294 epoch 4 - iter 65/138 - loss 0.35767349 - time (sec): 48.21 - samples/sec: 225.12 - lr: 0.000109 - momentum: 0.000000 2023-10-06 21:28:27,263 epoch 4 - iter 78/138 - loss 0.36400744 - time (sec): 58.18 - samples/sec: 226.27 - lr: 0.000107 - momentum: 0.000000 2023-10-06 21:28:36,823 epoch 4 - iter 91/138 - loss 0.35402502 - time (sec): 67.74 - samples/sec: 225.45 - lr: 0.000106 - momentum: 0.000000 2023-10-06 21:28:46,178 epoch 4 - iter 104/138 - loss 0.35060510 - time (sec): 77.10 - samples/sec: 225.10 - lr: 0.000104 - momentum: 0.000000 2023-10-06 21:28:55,388 epoch 4 - iter 117/138 - loss 0.34836346 - time (sec): 86.31 - samples/sec: 225.52 - lr: 0.000103 - momentum: 0.000000 2023-10-06 21:29:04,770 epoch 4 - iter 130/138 - loss 0.34011041 - time (sec): 95.69 - samples/sec: 224.33 - lr: 0.000101 - momentum: 0.000000 2023-10-06 21:29:10,486 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:29:10,487 EPOCH 4 done: loss 0.3342 - lr: 0.000101 2023-10-06 21:29:17,008 DEV : loss 0.2772534191608429 - f1-score (micro avg) 0.678 2023-10-06 21:29:17,014 saving best model 2023-10-06 21:29:17,951 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:29:27,445 epoch 5 - iter 13/138 - loss 0.28203934 - time (sec): 9.49 - samples/sec: 223.14 - lr: 0.000099 - momentum: 0.000000 2023-10-06 21:29:36,997 epoch 5 - iter 26/138 - loss 0.28741065 - time (sec): 19.04 - samples/sec: 223.54 - lr: 0.000097 - momentum: 0.000000 2023-10-06 21:29:46,184 epoch 5 - iter 39/138 - loss 0.28406102 - time (sec): 28.23 - samples/sec: 223.20 - lr: 0.000096 - momentum: 0.000000 2023-10-06 21:29:56,798 epoch 5 - iter 52/138 - loss 0.26546922 - time (sec): 38.85 - samples/sec: 227.72 - lr: 0.000094 - momentum: 0.000000 2023-10-06 21:30:06,834 epoch 5 - iter 65/138 - loss 0.25808129 - time (sec): 48.88 - samples/sec: 227.74 - lr: 0.000092 - momentum: 0.000000 2023-10-06 21:30:16,631 epoch 5 - iter 78/138 - loss 0.24922097 - time (sec): 58.68 - samples/sec: 226.66 - lr: 0.000091 - momentum: 0.000000 2023-10-06 21:30:26,957 epoch 5 - iter 91/138 - loss 0.23623014 - time (sec): 69.00 - samples/sec: 226.00 - lr: 0.000089 - momentum: 0.000000 2023-10-06 21:30:36,070 epoch 5 - iter 104/138 - loss 0.23165735 - time (sec): 78.12 - samples/sec: 226.44 - lr: 0.000088 - momentum: 0.000000 2023-10-06 21:30:45,195 epoch 5 - iter 117/138 - loss 0.22952710 - time (sec): 87.24 - samples/sec: 225.40 - lr: 0.000086 - momentum: 0.000000 2023-10-06 21:30:54,385 epoch 5 - iter 130/138 - loss 0.22404895 - time (sec): 96.43 - samples/sec: 225.02 - lr: 0.000085 - momentum: 0.000000 2023-10-06 21:30:59,468 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:30:59,468 EPOCH 5 done: loss 0.2240 - lr: 0.000085 2023-10-06 21:31:06,035 DEV : loss 0.19598008692264557 - f1-score (micro avg) 0.7949 2023-10-06 21:31:06,041 saving best model 2023-10-06 21:31:06,969 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:31:16,694 epoch 6 - iter 13/138 - loss 0.18634261 - time (sec): 9.72 - samples/sec: 233.67 - lr: 0.000082 - momentum: 0.000000 2023-10-06 21:31:25,765 epoch 6 - iter 26/138 - loss 0.17863421 - time (sec): 18.79 - samples/sec: 226.88 - lr: 0.000080 - momentum: 0.000000 2023-10-06 21:31:35,403 epoch 6 - iter 39/138 - loss 0.17476742 - time (sec): 28.43 - samples/sec: 228.12 - lr: 0.000079 - momentum: 0.000000 2023-10-06 21:31:45,360 epoch 6 - iter 52/138 - loss 0.16747262 - time (sec): 38.39 - samples/sec: 225.87 - lr: 0.000077 - momentum: 0.000000 2023-10-06 21:31:55,067 epoch 6 - iter 65/138 - loss 0.16048866 - time (sec): 48.10 - samples/sec: 225.51 - lr: 0.000076 - momentum: 0.000000 2023-10-06 21:32:04,762 epoch 6 - iter 78/138 - loss 0.15089094 - time (sec): 57.79 - samples/sec: 224.67 - lr: 0.000074 - momentum: 0.000000 2023-10-06 21:32:14,669 epoch 6 - iter 91/138 - loss 0.16037876 - time (sec): 67.70 - samples/sec: 226.54 - lr: 0.000073 - momentum: 0.000000 2023-10-06 21:32:24,055 epoch 6 - iter 104/138 - loss 0.15907285 - time (sec): 77.08 - samples/sec: 226.75 - lr: 0.000071 - momentum: 0.000000 2023-10-06 21:32:33,651 epoch 6 - iter 117/138 - loss 0.15689449 - time (sec): 86.68 - samples/sec: 225.89 - lr: 0.000070 - momentum: 0.000000 2023-10-06 21:32:42,847 epoch 6 - iter 130/138 - loss 0.15494003 - time (sec): 95.88 - samples/sec: 225.55 - lr: 0.000068 - momentum: 0.000000 2023-10-06 21:32:48,360 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:32:48,360 EPOCH 6 done: loss 0.1542 - lr: 0.000068 2023-10-06 21:32:54,924 DEV : loss 0.15650679171085358 - f1-score (micro avg) 0.8454 2023-10-06 21:32:54,930 saving best model 2023-10-06 21:32:55,856 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:33:05,121 epoch 7 - iter 13/138 - loss 0.11794619 - time (sec): 9.26 - samples/sec: 216.12 - lr: 0.000065 - momentum: 0.000000 2023-10-06 21:33:14,115 epoch 7 - iter 26/138 - loss 0.11257211 - time (sec): 18.26 - samples/sec: 216.51 - lr: 0.000064 - momentum: 0.000000 2023-10-06 21:33:23,709 epoch 7 - iter 39/138 - loss 0.12387657 - time (sec): 27.85 - samples/sec: 220.14 - lr: 0.000062 - momentum: 0.000000 2023-10-06 21:33:33,622 epoch 7 - iter 52/138 - loss 0.11163492 - time (sec): 37.76 - samples/sec: 223.97 - lr: 0.000061 - momentum: 0.000000 2023-10-06 21:33:43,752 epoch 7 - iter 65/138 - loss 0.10943729 - time (sec): 47.89 - samples/sec: 225.25 - lr: 0.000059 - momentum: 0.000000 2023-10-06 21:33:53,050 epoch 7 - iter 78/138 - loss 0.11358028 - time (sec): 57.19 - samples/sec: 223.21 - lr: 0.000058 - momentum: 0.000000 2023-10-06 21:34:02,880 epoch 7 - iter 91/138 - loss 0.11088694 - time (sec): 67.02 - samples/sec: 222.81 - lr: 0.000056 - momentum: 0.000000 2023-10-06 21:34:13,003 epoch 7 - iter 104/138 - loss 0.10771844 - time (sec): 77.15 - samples/sec: 224.68 - lr: 0.000054 - momentum: 0.000000 2023-10-06 21:34:22,527 epoch 7 - iter 117/138 - loss 0.10776106 - time (sec): 86.67 - samples/sec: 225.06 - lr: 0.000053 - momentum: 0.000000 2023-10-06 21:34:32,149 epoch 7 - iter 130/138 - loss 0.11463289 - time (sec): 96.29 - samples/sec: 225.21 - lr: 0.000051 - momentum: 0.000000 2023-10-06 21:34:37,416 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:34:37,417 EPOCH 7 done: loss 0.1126 - lr: 0.000051 2023-10-06 21:34:43,989 DEV : loss 0.13818491995334625 - f1-score (micro avg) 0.8507 2023-10-06 21:34:43,995 saving best model 2023-10-06 21:34:44,913 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:34:53,678 epoch 8 - iter 13/138 - loss 0.08870371 - time (sec): 8.76 - samples/sec: 214.18 - lr: 0.000049 - momentum: 0.000000 2023-10-06 21:35:03,110 epoch 8 - iter 26/138 - loss 0.10805496 - time (sec): 18.19 - samples/sec: 221.05 - lr: 0.000047 - momentum: 0.000000 2023-10-06 21:35:12,087 epoch 8 - iter 39/138 - loss 0.10456018 - time (sec): 27.17 - samples/sec: 220.96 - lr: 0.000046 - momentum: 0.000000 2023-10-06 21:35:22,263 epoch 8 - iter 52/138 - loss 0.09880645 - time (sec): 37.35 - samples/sec: 224.83 - lr: 0.000044 - momentum: 0.000000 2023-10-06 21:35:31,586 epoch 8 - iter 65/138 - loss 0.10200553 - time (sec): 46.67 - samples/sec: 222.25 - lr: 0.000043 - momentum: 0.000000 2023-10-06 21:35:41,682 epoch 8 - iter 78/138 - loss 0.10250929 - time (sec): 56.77 - samples/sec: 222.98 - lr: 0.000041 - momentum: 0.000000 2023-10-06 21:35:50,638 epoch 8 - iter 91/138 - loss 0.09708388 - time (sec): 65.72 - samples/sec: 223.10 - lr: 0.000039 - momentum: 0.000000 2023-10-06 21:36:01,047 epoch 8 - iter 104/138 - loss 0.09000925 - time (sec): 76.13 - samples/sec: 223.89 - lr: 0.000038 - momentum: 0.000000 2023-10-06 21:36:10,692 epoch 8 - iter 117/138 - loss 0.09244801 - time (sec): 85.78 - samples/sec: 224.36 - lr: 0.000036 - momentum: 0.000000 2023-10-06 21:36:20,752 epoch 8 - iter 130/138 - loss 0.09167540 - time (sec): 95.84 - samples/sec: 224.77 - lr: 0.000035 - momentum: 0.000000 2023-10-06 21:36:26,280 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:36:26,280 EPOCH 8 done: loss 0.0900 - lr: 0.000035 2023-10-06 21:36:32,841 DEV : loss 0.13311366736888885 - f1-score (micro avg) 0.8537 2023-10-06 21:36:32,847 saving best model 2023-10-06 21:36:33,772 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:36:43,070 epoch 9 - iter 13/138 - loss 0.10113381 - time (sec): 9.30 - samples/sec: 223.54 - lr: 0.000032 - momentum: 0.000000 2023-10-06 21:36:52,341 epoch 9 - iter 26/138 - loss 0.10226201 - time (sec): 18.57 - samples/sec: 223.84 - lr: 0.000031 - momentum: 0.000000 2023-10-06 21:37:02,040 epoch 9 - iter 39/138 - loss 0.09875012 - time (sec): 28.27 - samples/sec: 221.40 - lr: 0.000029 - momentum: 0.000000 2023-10-06 21:37:11,800 epoch 9 - iter 52/138 - loss 0.09425359 - time (sec): 38.03 - samples/sec: 222.22 - lr: 0.000027 - momentum: 0.000000 2023-10-06 21:37:21,488 epoch 9 - iter 65/138 - loss 0.09510257 - time (sec): 47.71 - samples/sec: 222.22 - lr: 0.000026 - momentum: 0.000000 2023-10-06 21:37:30,916 epoch 9 - iter 78/138 - loss 0.09429005 - time (sec): 57.14 - samples/sec: 223.74 - lr: 0.000024 - momentum: 0.000000 2023-10-06 21:37:41,652 epoch 9 - iter 91/138 - loss 0.08655659 - time (sec): 67.88 - samples/sec: 225.23 - lr: 0.000023 - momentum: 0.000000 2023-10-06 21:37:50,906 epoch 9 - iter 104/138 - loss 0.08343271 - time (sec): 77.13 - samples/sec: 224.56 - lr: 0.000021 - momentum: 0.000000 2023-10-06 21:38:00,073 epoch 9 - iter 117/138 - loss 0.07924386 - time (sec): 86.30 - samples/sec: 223.92 - lr: 0.000020 - momentum: 0.000000 2023-10-06 21:38:09,813 epoch 9 - iter 130/138 - loss 0.07670001 - time (sec): 96.04 - samples/sec: 224.53 - lr: 0.000018 - momentum: 0.000000 2023-10-06 21:38:15,191 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:38:15,192 EPOCH 9 done: loss 0.0777 - lr: 0.000018 2023-10-06 21:38:21,741 DEV : loss 0.12675325572490692 - f1-score (micro avg) 0.8527 2023-10-06 21:38:21,747 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:38:31,559 epoch 10 - iter 13/138 - loss 0.11854547 - time (sec): 9.81 - samples/sec: 222.51 - lr: 0.000016 - momentum: 0.000000 2023-10-06 21:38:41,131 epoch 10 - iter 26/138 - loss 0.09580486 - time (sec): 19.38 - samples/sec: 222.67 - lr: 0.000014 - momentum: 0.000000 2023-10-06 21:38:50,152 epoch 10 - iter 39/138 - loss 0.08844857 - time (sec): 28.40 - samples/sec: 221.63 - lr: 0.000012 - momentum: 0.000000 2023-10-06 21:39:00,295 epoch 10 - iter 52/138 - loss 0.08652563 - time (sec): 38.55 - samples/sec: 224.77 - lr: 0.000011 - momentum: 0.000000 2023-10-06 21:39:09,592 epoch 10 - iter 65/138 - loss 0.07621668 - time (sec): 47.84 - samples/sec: 223.04 - lr: 0.000009 - momentum: 0.000000 2023-10-06 21:39:19,172 epoch 10 - iter 78/138 - loss 0.07584594 - time (sec): 57.42 - samples/sec: 223.39 - lr: 0.000008 - momentum: 0.000000 2023-10-06 21:39:28,716 epoch 10 - iter 91/138 - loss 0.07089306 - time (sec): 66.97 - samples/sec: 222.60 - lr: 0.000006 - momentum: 0.000000 2023-10-06 21:39:37,749 epoch 10 - iter 104/138 - loss 0.07295480 - time (sec): 76.00 - samples/sec: 222.76 - lr: 0.000005 - momentum: 0.000000 2023-10-06 21:39:47,885 epoch 10 - iter 117/138 - loss 0.07254737 - time (sec): 86.14 - samples/sec: 224.18 - lr: 0.000003 - momentum: 0.000000 2023-10-06 21:39:57,117 epoch 10 - iter 130/138 - loss 0.07161321 - time (sec): 95.37 - samples/sec: 224.53 - lr: 0.000001 - momentum: 0.000000 2023-10-06 21:40:03,017 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:40:03,018 EPOCH 10 done: loss 0.0718 - lr: 0.000001 2023-10-06 21:40:09,577 DEV : loss 0.1252679079771042 - f1-score (micro avg) 0.8551 2023-10-06 21:40:09,583 saving best model 2023-10-06 21:40:11,527 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:40:11,546 Loading model from best epoch ... 2023-10-06 21:40:14,522 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-06 21:40:21,612 Results: - F-score (micro) 0.8846 - F-score (macro) 0.5252 - Accuracy 0.8177 By class: precision recall f1-score support scope 0.9000 0.9205 0.9101 176 pers 0.8815 0.9297 0.9049 128 work 0.8108 0.8108 0.8108 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8766 0.8927 0.8846 382 macro avg 0.5185 0.5322 0.5252 382 weighted avg 0.8671 0.8927 0.8796 382 2023-10-06 21:40:21,613 ----------------------------------------------------------------------------------------------------