2023-10-06 22:58:57,336 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,337 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 22:58:57,337 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,337 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-06 22:58:57,337 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,337 Train: 1100 sentences 2023-10-06 22:58:57,337 (train_with_dev=False, train_with_test=False) 2023-10-06 22:58:57,337 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,337 Training Params: 2023-10-06 22:58:57,338 - learning_rate: "0.00016" 2023-10-06 22:58:57,338 - mini_batch_size: "8" 2023-10-06 22:58:57,338 - max_epochs: "10" 2023-10-06 22:58:57,338 - shuffle: "True" 2023-10-06 22:58:57,338 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,338 Plugins: 2023-10-06 22:58:57,338 - TensorboardLogger 2023-10-06 22:58:57,338 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 22:58:57,338 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,338 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 22:58:57,338 - metric: "('micro avg', 'f1-score')" 2023-10-06 22:58:57,338 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,338 Computation: 2023-10-06 22:58:57,338 - compute on device: cuda:0 2023-10-06 22:58:57,338 - embedding storage: none 2023-10-06 22:58:57,338 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,338 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" 2023-10-06 22:58:57,338 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,338 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:57,339 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 22:59:06,855 epoch 1 - iter 13/138 - loss 3.23260884 - time (sec): 9.52 - samples/sec: 217.65 - lr: 0.000014 - momentum: 0.000000 2023-10-06 22:59:16,828 epoch 1 - iter 26/138 - loss 3.22664289 - time (sec): 19.49 - samples/sec: 219.88 - lr: 0.000029 - momentum: 0.000000 2023-10-06 22:59:27,070 epoch 1 - iter 39/138 - loss 3.21831912 - time (sec): 29.73 - samples/sec: 224.66 - lr: 0.000044 - momentum: 0.000000 2023-10-06 22:59:36,298 epoch 1 - iter 52/138 - loss 3.20533098 - time (sec): 38.96 - samples/sec: 220.83 - lr: 0.000059 - momentum: 0.000000 2023-10-06 22:59:46,161 epoch 1 - iter 65/138 - loss 3.17618858 - time (sec): 48.82 - samples/sec: 222.03 - lr: 0.000074 - momentum: 0.000000 2023-10-06 22:59:55,032 epoch 1 - iter 78/138 - loss 3.13303145 - time (sec): 57.69 - samples/sec: 218.78 - lr: 0.000089 - momentum: 0.000000 2023-10-06 23:00:04,668 epoch 1 - iter 91/138 - loss 3.06421490 - time (sec): 67.33 - samples/sec: 220.64 - lr: 0.000104 - momentum: 0.000000 2023-10-06 23:00:14,565 epoch 1 - iter 104/138 - loss 2.98492127 - time (sec): 77.23 - samples/sec: 222.46 - lr: 0.000119 - momentum: 0.000000 2023-10-06 23:00:24,071 epoch 1 - iter 117/138 - loss 2.90531533 - time (sec): 86.73 - samples/sec: 222.37 - lr: 0.000134 - momentum: 0.000000 2023-10-06 23:00:34,215 epoch 1 - iter 130/138 - loss 2.81157005 - time (sec): 96.88 - samples/sec: 223.64 - lr: 0.000150 - momentum: 0.000000 2023-10-06 23:00:39,326 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:00:39,327 EPOCH 1 done: loss 2.7695 - lr: 0.000150 2023-10-06 23:00:45,852 DEV : loss 1.7750880718231201 - f1-score (micro avg) 0.0 2023-10-06 23:00:45,857 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:00:55,023 epoch 2 - iter 13/138 - loss 1.74050783 - time (sec): 9.16 - samples/sec: 228.28 - lr: 0.000158 - momentum: 0.000000 2023-10-06 23:01:04,564 epoch 2 - iter 26/138 - loss 1.64644696 - time (sec): 18.71 - samples/sec: 228.00 - lr: 0.000157 - momentum: 0.000000 2023-10-06 23:01:13,842 epoch 2 - iter 39/138 - loss 1.54065063 - time (sec): 27.98 - samples/sec: 224.96 - lr: 0.000155 - momentum: 0.000000 2023-10-06 23:01:23,303 epoch 2 - iter 52/138 - loss 1.46805938 - time (sec): 37.44 - samples/sec: 226.63 - lr: 0.000153 - momentum: 0.000000 2023-10-06 23:01:32,861 epoch 2 - iter 65/138 - loss 1.38258633 - time (sec): 47.00 - samples/sec: 225.77 - lr: 0.000152 - momentum: 0.000000 2023-10-06 23:01:43,028 epoch 2 - iter 78/138 - loss 1.30259193 - time (sec): 57.17 - samples/sec: 224.95 - lr: 0.000150 - momentum: 0.000000 2023-10-06 23:01:52,535 epoch 2 - iter 91/138 - loss 1.24837081 - time (sec): 66.68 - samples/sec: 222.87 - lr: 0.000148 - momentum: 0.000000 2023-10-06 23:02:02,396 epoch 2 - iter 104/138 - loss 1.17710429 - time (sec): 76.54 - samples/sec: 222.09 - lr: 0.000147 - momentum: 0.000000 2023-10-06 23:02:12,192 epoch 2 - iter 117/138 - loss 1.13771314 - time (sec): 86.33 - samples/sec: 222.39 - lr: 0.000145 - momentum: 0.000000 2023-10-06 23:02:22,018 epoch 2 - iter 130/138 - loss 1.11624221 - time (sec): 96.16 - samples/sec: 223.58 - lr: 0.000143 - momentum: 0.000000 2023-10-06 23:02:27,563 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:02:27,564 EPOCH 2 done: loss 1.0979 - lr: 0.000143 2023-10-06 23:02:34,092 DEV : loss 0.8365278840065002 - f1-score (micro avg) 0.0 2023-10-06 23:02:34,098 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:02:43,285 epoch 3 - iter 13/138 - loss 0.74763496 - time (sec): 9.19 - samples/sec: 224.49 - lr: 0.000141 - momentum: 0.000000 2023-10-06 23:02:53,395 epoch 3 - iter 26/138 - loss 0.68351995 - time (sec): 19.30 - samples/sec: 228.76 - lr: 0.000139 - momentum: 0.000000 2023-10-06 23:03:03,327 epoch 3 - iter 39/138 - loss 0.68174302 - time (sec): 29.23 - samples/sec: 228.48 - lr: 0.000137 - momentum: 0.000000 2023-10-06 23:03:12,774 epoch 3 - iter 52/138 - loss 0.65561942 - time (sec): 38.67 - samples/sec: 227.18 - lr: 0.000136 - momentum: 0.000000 2023-10-06 23:03:21,680 epoch 3 - iter 65/138 - loss 0.62826164 - time (sec): 47.58 - samples/sec: 224.08 - lr: 0.000134 - momentum: 0.000000 2023-10-06 23:03:31,566 epoch 3 - iter 78/138 - loss 0.60223906 - time (sec): 57.47 - samples/sec: 224.97 - lr: 0.000132 - momentum: 0.000000 2023-10-06 23:03:41,089 epoch 3 - iter 91/138 - loss 0.59746947 - time (sec): 66.99 - samples/sec: 226.05 - lr: 0.000131 - momentum: 0.000000 2023-10-06 23:03:50,415 epoch 3 - iter 104/138 - loss 0.58196954 - time (sec): 76.32 - samples/sec: 225.54 - lr: 0.000129 - momentum: 0.000000 2023-10-06 23:04:00,169 epoch 3 - iter 117/138 - loss 0.56269360 - time (sec): 86.07 - samples/sec: 224.78 - lr: 0.000127 - momentum: 0.000000 2023-10-06 23:04:10,241 epoch 3 - iter 130/138 - loss 0.54502390 - time (sec): 96.14 - samples/sec: 224.97 - lr: 0.000126 - momentum: 0.000000 2023-10-06 23:04:15,443 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:04:15,444 EPOCH 3 done: loss 0.5434 - lr: 0.000126 2023-10-06 23:04:22,144 DEV : loss 0.42084553837776184 - f1-score (micro avg) 0.4644 2023-10-06 23:04:22,149 saving best model 2023-10-06 23:04:23,037 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:04:32,983 epoch 4 - iter 13/138 - loss 0.37308005 - time (sec): 9.95 - samples/sec: 230.47 - lr: 0.000123 - momentum: 0.000000 2023-10-06 23:04:42,743 epoch 4 - iter 26/138 - loss 0.37178256 - time (sec): 19.71 - samples/sec: 230.29 - lr: 0.000121 - momentum: 0.000000 2023-10-06 23:04:52,104 epoch 4 - iter 39/138 - loss 0.38038232 - time (sec): 29.07 - samples/sec: 229.61 - lr: 0.000120 - momentum: 0.000000 2023-10-06 23:05:01,555 epoch 4 - iter 52/138 - loss 0.36998733 - time (sec): 38.52 - samples/sec: 229.48 - lr: 0.000118 - momentum: 0.000000 2023-10-06 23:05:10,838 epoch 4 - iter 65/138 - loss 0.36466479 - time (sec): 47.80 - samples/sec: 227.70 - lr: 0.000116 - momentum: 0.000000 2023-10-06 23:05:20,108 epoch 4 - iter 78/138 - loss 0.35554322 - time (sec): 57.07 - samples/sec: 226.85 - lr: 0.000115 - momentum: 0.000000 2023-10-06 23:05:30,224 epoch 4 - iter 91/138 - loss 0.34302395 - time (sec): 67.19 - samples/sec: 226.45 - lr: 0.000113 - momentum: 0.000000 2023-10-06 23:05:39,490 epoch 4 - iter 104/138 - loss 0.32952578 - time (sec): 76.45 - samples/sec: 225.36 - lr: 0.000111 - momentum: 0.000000 2023-10-06 23:05:48,553 epoch 4 - iter 117/138 - loss 0.32926164 - time (sec): 85.51 - samples/sec: 225.22 - lr: 0.000110 - momentum: 0.000000 2023-10-06 23:05:59,050 epoch 4 - iter 130/138 - loss 0.32528695 - time (sec): 96.01 - samples/sec: 225.21 - lr: 0.000108 - momentum: 0.000000 2023-10-06 23:06:04,430 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:06:04,431 EPOCH 4 done: loss 0.3208 - lr: 0.000108 2023-10-06 23:06:11,149 DEV : loss 0.2693938910961151 - f1-score (micro avg) 0.7293 2023-10-06 23:06:11,155 saving best model 2023-10-06 23:06:12,078 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:06:22,157 epoch 5 - iter 13/138 - loss 0.30238062 - time (sec): 10.08 - samples/sec: 238.35 - lr: 0.000105 - momentum: 0.000000 2023-10-06 23:06:31,875 epoch 5 - iter 26/138 - loss 0.27286214 - time (sec): 19.80 - samples/sec: 228.23 - lr: 0.000104 - momentum: 0.000000 2023-10-06 23:06:41,313 epoch 5 - iter 39/138 - loss 0.24692950 - time (sec): 29.23 - samples/sec: 226.76 - lr: 0.000102 - momentum: 0.000000 2023-10-06 23:06:51,165 epoch 5 - iter 52/138 - loss 0.24348959 - time (sec): 39.09 - samples/sec: 227.12 - lr: 0.000100 - momentum: 0.000000 2023-10-06 23:06:59,926 epoch 5 - iter 65/138 - loss 0.22808092 - time (sec): 47.85 - samples/sec: 224.03 - lr: 0.000099 - momentum: 0.000000 2023-10-06 23:07:09,648 epoch 5 - iter 78/138 - loss 0.21968723 - time (sec): 57.57 - samples/sec: 224.39 - lr: 0.000097 - momentum: 0.000000 2023-10-06 23:07:19,937 epoch 5 - iter 91/138 - loss 0.22183698 - time (sec): 67.86 - samples/sec: 224.82 - lr: 0.000095 - momentum: 0.000000 2023-10-06 23:07:29,313 epoch 5 - iter 104/138 - loss 0.21706942 - time (sec): 77.23 - samples/sec: 223.52 - lr: 0.000094 - momentum: 0.000000 2023-10-06 23:07:39,130 epoch 5 - iter 117/138 - loss 0.21193136 - time (sec): 87.05 - samples/sec: 223.17 - lr: 0.000092 - momentum: 0.000000 2023-10-06 23:07:48,198 epoch 5 - iter 130/138 - loss 0.20982920 - time (sec): 96.12 - samples/sec: 222.96 - lr: 0.000090 - momentum: 0.000000 2023-10-06 23:07:54,086 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:07:54,086 EPOCH 5 done: loss 0.2114 - lr: 0.000090 2023-10-06 23:08:00,681 DEV : loss 0.18538373708724976 - f1-score (micro avg) 0.7967 2023-10-06 23:08:00,688 saving best model 2023-10-06 23:08:01,811 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:08:11,657 epoch 6 - iter 13/138 - loss 0.16319693 - time (sec): 9.84 - samples/sec: 227.44 - lr: 0.000088 - momentum: 0.000000 2023-10-06 23:08:21,633 epoch 6 - iter 26/138 - loss 0.15158098 - time (sec): 19.82 - samples/sec: 227.49 - lr: 0.000086 - momentum: 0.000000 2023-10-06 23:08:31,878 epoch 6 - iter 39/138 - loss 0.15606503 - time (sec): 30.07 - samples/sec: 231.06 - lr: 0.000084 - momentum: 0.000000 2023-10-06 23:08:41,551 epoch 6 - iter 52/138 - loss 0.15716148 - time (sec): 39.74 - samples/sec: 229.67 - lr: 0.000083 - momentum: 0.000000 2023-10-06 23:08:50,500 epoch 6 - iter 65/138 - loss 0.15729222 - time (sec): 48.69 - samples/sec: 227.99 - lr: 0.000081 - momentum: 0.000000 2023-10-06 23:08:59,181 epoch 6 - iter 78/138 - loss 0.15916206 - time (sec): 57.37 - samples/sec: 225.94 - lr: 0.000079 - momentum: 0.000000 2023-10-06 23:09:08,673 epoch 6 - iter 91/138 - loss 0.15618419 - time (sec): 66.86 - samples/sec: 225.89 - lr: 0.000077 - momentum: 0.000000 2023-10-06 23:09:18,046 epoch 6 - iter 104/138 - loss 0.15634928 - time (sec): 76.23 - samples/sec: 226.20 - lr: 0.000076 - momentum: 0.000000 2023-10-06 23:09:27,387 epoch 6 - iter 117/138 - loss 0.15288518 - time (sec): 85.57 - samples/sec: 225.27 - lr: 0.000074 - momentum: 0.000000 2023-10-06 23:09:37,198 epoch 6 - iter 130/138 - loss 0.14857979 - time (sec): 95.39 - samples/sec: 225.57 - lr: 0.000072 - momentum: 0.000000 2023-10-06 23:09:42,925 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:09:42,925 EPOCH 6 done: loss 0.1454 - lr: 0.000072 2023-10-06 23:09:49,462 DEV : loss 0.14982718229293823 - f1-score (micro avg) 0.8445 2023-10-06 23:09:49,468 saving best model 2023-10-06 23:09:50,384 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:10:00,510 epoch 7 - iter 13/138 - loss 0.09494636 - time (sec): 10.12 - samples/sec: 233.39 - lr: 0.000070 - momentum: 0.000000 2023-10-06 23:10:09,910 epoch 7 - iter 26/138 - loss 0.10748605 - time (sec): 19.52 - samples/sec: 229.10 - lr: 0.000068 - momentum: 0.000000 2023-10-06 23:10:19,333 epoch 7 - iter 39/138 - loss 0.10060464 - time (sec): 28.95 - samples/sec: 225.44 - lr: 0.000066 - momentum: 0.000000 2023-10-06 23:10:28,126 epoch 7 - iter 52/138 - loss 0.10299215 - time (sec): 37.74 - samples/sec: 224.08 - lr: 0.000065 - momentum: 0.000000 2023-10-06 23:10:37,593 epoch 7 - iter 65/138 - loss 0.10195734 - time (sec): 47.21 - samples/sec: 223.73 - lr: 0.000063 - momentum: 0.000000 2023-10-06 23:10:47,569 epoch 7 - iter 78/138 - loss 0.10684372 - time (sec): 57.18 - samples/sec: 223.91 - lr: 0.000061 - momentum: 0.000000 2023-10-06 23:10:57,047 epoch 7 - iter 91/138 - loss 0.10506388 - time (sec): 66.66 - samples/sec: 223.44 - lr: 0.000060 - momentum: 0.000000 2023-10-06 23:11:06,535 epoch 7 - iter 104/138 - loss 0.10829319 - time (sec): 76.15 - samples/sec: 223.84 - lr: 0.000058 - momentum: 0.000000 2023-10-06 23:11:15,533 epoch 7 - iter 117/138 - loss 0.11329062 - time (sec): 85.15 - samples/sec: 223.51 - lr: 0.000056 - momentum: 0.000000 2023-10-06 23:11:25,484 epoch 7 - iter 130/138 - loss 0.11073173 - time (sec): 95.10 - samples/sec: 224.85 - lr: 0.000055 - momentum: 0.000000 2023-10-06 23:11:31,498 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:11:31,498 EPOCH 7 done: loss 0.1111 - lr: 0.000055 2023-10-06 23:11:38,012 DEV : loss 0.13896368443965912 - f1-score (micro avg) 0.8462 2023-10-06 23:11:38,017 saving best model 2023-10-06 23:11:38,935 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:11:48,231 epoch 8 - iter 13/138 - loss 0.10634901 - time (sec): 9.29 - samples/sec: 224.00 - lr: 0.000052 - momentum: 0.000000 2023-10-06 23:11:57,413 epoch 8 - iter 26/138 - loss 0.09809801 - time (sec): 18.48 - samples/sec: 220.22 - lr: 0.000050 - momentum: 0.000000 2023-10-06 23:12:07,733 epoch 8 - iter 39/138 - loss 0.10574517 - time (sec): 28.80 - samples/sec: 226.83 - lr: 0.000049 - momentum: 0.000000 2023-10-06 23:12:17,940 epoch 8 - iter 52/138 - loss 0.10606045 - time (sec): 39.00 - samples/sec: 227.34 - lr: 0.000047 - momentum: 0.000000 2023-10-06 23:12:27,739 epoch 8 - iter 65/138 - loss 0.09912891 - time (sec): 48.80 - samples/sec: 226.48 - lr: 0.000045 - momentum: 0.000000 2023-10-06 23:12:36,936 epoch 8 - iter 78/138 - loss 0.09361155 - time (sec): 58.00 - samples/sec: 225.48 - lr: 0.000044 - momentum: 0.000000 2023-10-06 23:12:46,511 epoch 8 - iter 91/138 - loss 0.09318882 - time (sec): 67.58 - samples/sec: 224.55 - lr: 0.000042 - momentum: 0.000000 2023-10-06 23:12:56,128 epoch 8 - iter 104/138 - loss 0.09671327 - time (sec): 77.19 - samples/sec: 225.00 - lr: 0.000040 - momentum: 0.000000 2023-10-06 23:13:05,555 epoch 8 - iter 117/138 - loss 0.09120858 - time (sec): 86.62 - samples/sec: 223.75 - lr: 0.000039 - momentum: 0.000000 2023-10-06 23:13:14,979 epoch 8 - iter 130/138 - loss 0.08938503 - time (sec): 96.04 - samples/sec: 222.56 - lr: 0.000037 - momentum: 0.000000 2023-10-06 23:13:21,015 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:13:21,016 EPOCH 8 done: loss 0.0877 - lr: 0.000037 2023-10-06 23:13:27,658 DEV : loss 0.12855148315429688 - f1-score (micro avg) 0.8674 2023-10-06 23:13:27,664 saving best model 2023-10-06 23:13:28,593 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:13:38,083 epoch 9 - iter 13/138 - loss 0.10617215 - time (sec): 9.49 - samples/sec: 229.03 - lr: 0.000034 - momentum: 0.000000 2023-10-06 23:13:48,802 epoch 9 - iter 26/138 - loss 0.07976676 - time (sec): 20.21 - samples/sec: 231.20 - lr: 0.000033 - momentum: 0.000000 2023-10-06 23:13:57,973 epoch 9 - iter 39/138 - loss 0.07881498 - time (sec): 29.38 - samples/sec: 225.17 - lr: 0.000031 - momentum: 0.000000 2023-10-06 23:14:07,515 epoch 9 - iter 52/138 - loss 0.07289011 - time (sec): 38.92 - samples/sec: 222.40 - lr: 0.000029 - momentum: 0.000000 2023-10-06 23:14:17,229 epoch 9 - iter 65/138 - loss 0.07185365 - time (sec): 48.63 - samples/sec: 222.29 - lr: 0.000028 - momentum: 0.000000 2023-10-06 23:14:27,550 epoch 9 - iter 78/138 - loss 0.06812614 - time (sec): 58.95 - samples/sec: 224.36 - lr: 0.000026 - momentum: 0.000000 2023-10-06 23:14:36,753 epoch 9 - iter 91/138 - loss 0.06472198 - time (sec): 68.16 - samples/sec: 223.14 - lr: 0.000024 - momentum: 0.000000 2023-10-06 23:14:46,067 epoch 9 - iter 104/138 - loss 0.06685049 - time (sec): 77.47 - samples/sec: 221.99 - lr: 0.000023 - momentum: 0.000000 2023-10-06 23:14:54,877 epoch 9 - iter 117/138 - loss 0.07294624 - time (sec): 86.28 - samples/sec: 221.48 - lr: 0.000021 - momentum: 0.000000 2023-10-06 23:15:04,874 epoch 9 - iter 130/138 - loss 0.07491454 - time (sec): 96.28 - samples/sec: 222.34 - lr: 0.000019 - momentum: 0.000000 2023-10-06 23:15:10,950 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:15:10,950 EPOCH 9 done: loss 0.0765 - lr: 0.000019 2023-10-06 23:15:17,620 DEV : loss 0.12676303088665009 - f1-score (micro avg) 0.864 2023-10-06 23:15:17,626 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:15:26,943 epoch 10 - iter 13/138 - loss 0.06158931 - time (sec): 9.32 - samples/sec: 227.46 - lr: 0.000017 - momentum: 0.000000 2023-10-06 23:15:36,435 epoch 10 - iter 26/138 - loss 0.06378250 - time (sec): 18.81 - samples/sec: 226.34 - lr: 0.000015 - momentum: 0.000000 2023-10-06 23:15:45,917 epoch 10 - iter 39/138 - loss 0.07297983 - time (sec): 28.29 - samples/sec: 222.58 - lr: 0.000013 - momentum: 0.000000 2023-10-06 23:15:55,175 epoch 10 - iter 52/138 - loss 0.06889318 - time (sec): 37.55 - samples/sec: 219.32 - lr: 0.000012 - momentum: 0.000000 2023-10-06 23:16:06,014 epoch 10 - iter 65/138 - loss 0.07591846 - time (sec): 48.39 - samples/sec: 220.93 - lr: 0.000010 - momentum: 0.000000 2023-10-06 23:16:16,057 epoch 10 - iter 78/138 - loss 0.07350222 - time (sec): 58.43 - samples/sec: 221.65 - lr: 0.000008 - momentum: 0.000000 2023-10-06 23:16:25,558 epoch 10 - iter 91/138 - loss 0.07126380 - time (sec): 67.93 - samples/sec: 222.04 - lr: 0.000007 - momentum: 0.000000 2023-10-06 23:16:35,274 epoch 10 - iter 104/138 - loss 0.06908311 - time (sec): 77.65 - samples/sec: 222.62 - lr: 0.000005 - momentum: 0.000000 2023-10-06 23:16:45,529 epoch 10 - iter 117/138 - loss 0.06851633 - time (sec): 87.90 - samples/sec: 223.37 - lr: 0.000003 - momentum: 0.000000 2023-10-06 23:16:54,812 epoch 10 - iter 130/138 - loss 0.06804469 - time (sec): 97.19 - samples/sec: 222.50 - lr: 0.000002 - momentum: 0.000000 2023-10-06 23:17:00,014 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:17:00,015 EPOCH 10 done: loss 0.0704 - lr: 0.000002 2023-10-06 23:17:06,713 DEV : loss 0.12650427222251892 - f1-score (micro avg) 0.8684 2023-10-06 23:17:06,719 saving best model 2023-10-06 23:17:08,627 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:17:08,628 Loading model from best epoch ... 2023-10-06 23:17:11,377 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-06 23:17:18,587 Results: - F-score (micro) 0.8779 - F-score (macro) 0.5227 - Accuracy 0.8009 By class: precision recall f1-score support scope 0.8827 0.8977 0.8901 176 pers 0.9015 0.9297 0.9154 128 work 0.7922 0.8243 0.8079 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8711 0.8848 0.8779 382 macro avg 0.5153 0.5303 0.5227 382 weighted avg 0.8622 0.8848 0.8734 382 2023-10-06 23:17:18,588 ----------------------------------------------------------------------------------------------------