|
2023-10-11 03:31:48,290 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,292 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 03:31:48,292 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,292 MultiCorpus: 7142 train + 698 dev + 2570 test sentences |
|
- NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator |
|
2023-10-11 03:31:48,293 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,293 Train: 7142 sentences |
|
2023-10-11 03:31:48,293 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 03:31:48,293 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,293 Training Params: |
|
2023-10-11 03:31:48,293 - learning_rate: "0.00016" |
|
2023-10-11 03:31:48,293 - mini_batch_size: "8" |
|
2023-10-11 03:31:48,293 - max_epochs: "10" |
|
2023-10-11 03:31:48,293 - shuffle: "True" |
|
2023-10-11 03:31:48,293 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,293 Plugins: |
|
2023-10-11 03:31:48,293 - TensorboardLogger |
|
2023-10-11 03:31:48,293 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 03:31:48,293 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,293 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 03:31:48,294 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 03:31:48,294 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,294 Computation: |
|
2023-10-11 03:31:48,294 - compute on device: cuda:0 |
|
2023-10-11 03:31:48,294 - embedding storage: none |
|
2023-10-11 03:31:48,294 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,294 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" |
|
2023-10-11 03:31:48,294 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,294 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:31:48,294 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 03:32:38,651 epoch 1 - iter 89/893 - loss 2.83712457 - time (sec): 50.36 - samples/sec: 527.35 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 03:33:27,129 epoch 1 - iter 178/893 - loss 2.76856474 - time (sec): 98.83 - samples/sec: 523.88 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 03:34:17,244 epoch 1 - iter 267/893 - loss 2.56645727 - time (sec): 148.95 - samples/sec: 521.49 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 03:35:06,565 epoch 1 - iter 356/893 - loss 2.33448011 - time (sec): 198.27 - samples/sec: 519.23 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 03:36:00,091 epoch 1 - iter 445/893 - loss 2.10432695 - time (sec): 251.80 - samples/sec: 505.41 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 03:36:51,822 epoch 1 - iter 534/893 - loss 1.88059820 - time (sec): 303.53 - samples/sec: 504.20 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 03:37:42,586 epoch 1 - iter 623/893 - loss 1.70794075 - time (sec): 354.29 - samples/sec: 502.17 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-11 03:38:32,514 epoch 1 - iter 712/893 - loss 1.57064750 - time (sec): 404.22 - samples/sec: 497.50 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 03:39:22,687 epoch 1 - iter 801/893 - loss 1.45016177 - time (sec): 454.39 - samples/sec: 494.09 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-11 03:40:13,992 epoch 1 - iter 890/893 - loss 1.34843504 - time (sec): 505.70 - samples/sec: 490.92 - lr: 0.000159 - momentum: 0.000000 |
|
2023-10-11 03:40:15,430 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:40:15,430 EPOCH 1 done: loss 1.3466 - lr: 0.000159 |
|
2023-10-11 03:40:35,102 DEV : loss 0.2833699882030487 - f1-score (micro avg) 0.1705 |
|
2023-10-11 03:40:35,135 saving best model |
|
2023-10-11 03:40:36,033 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:41:27,920 epoch 2 - iter 89/893 - loss 0.31854275 - time (sec): 51.89 - samples/sec: 484.38 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 03:42:20,133 epoch 2 - iter 178/893 - loss 0.30105422 - time (sec): 104.10 - samples/sec: 494.94 - lr: 0.000156 - momentum: 0.000000 |
|
2023-10-11 03:43:10,984 epoch 2 - iter 267/893 - loss 0.28521306 - time (sec): 154.95 - samples/sec: 492.72 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-11 03:44:01,012 epoch 2 - iter 356/893 - loss 0.26928774 - time (sec): 204.98 - samples/sec: 490.36 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-11 03:44:50,907 epoch 2 - iter 445/893 - loss 0.25483997 - time (sec): 254.87 - samples/sec: 488.75 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-11 03:45:42,828 epoch 2 - iter 534/893 - loss 0.24373078 - time (sec): 306.79 - samples/sec: 483.89 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 03:46:32,863 epoch 2 - iter 623/893 - loss 0.23211977 - time (sec): 356.83 - samples/sec: 482.34 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 03:47:24,191 epoch 2 - iter 712/893 - loss 0.22071424 - time (sec): 408.16 - samples/sec: 483.45 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-11 03:48:15,045 epoch 2 - iter 801/893 - loss 0.21094737 - time (sec): 459.01 - samples/sec: 486.71 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-11 03:49:04,867 epoch 2 - iter 890/893 - loss 0.20114803 - time (sec): 508.83 - samples/sec: 487.40 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 03:49:06,375 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:49:06,376 EPOCH 2 done: loss 0.2009 - lr: 0.000142 |
|
2023-10-11 03:49:28,304 DEV : loss 0.11237920075654984 - f1-score (micro avg) 0.7166 |
|
2023-10-11 03:49:28,336 saving best model |
|
2023-10-11 03:49:30,905 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:50:22,507 epoch 3 - iter 89/893 - loss 0.09483268 - time (sec): 51.60 - samples/sec: 500.89 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-11 03:51:12,638 epoch 3 - iter 178/893 - loss 0.09263939 - time (sec): 101.73 - samples/sec: 478.43 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-11 03:52:02,580 epoch 3 - iter 267/893 - loss 0.09132766 - time (sec): 151.67 - samples/sec: 487.98 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 03:52:51,470 epoch 3 - iter 356/893 - loss 0.09076148 - time (sec): 200.56 - samples/sec: 489.76 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 03:53:42,098 epoch 3 - iter 445/893 - loss 0.08608269 - time (sec): 251.19 - samples/sec: 489.35 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 03:54:31,559 epoch 3 - iter 534/893 - loss 0.08616002 - time (sec): 300.65 - samples/sec: 492.76 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 03:55:21,855 epoch 3 - iter 623/893 - loss 0.08359586 - time (sec): 350.95 - samples/sec: 498.06 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 03:56:11,917 epoch 3 - iter 712/893 - loss 0.08345896 - time (sec): 401.01 - samples/sec: 497.89 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 03:57:01,663 epoch 3 - iter 801/893 - loss 0.08264139 - time (sec): 450.75 - samples/sec: 496.55 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 03:57:53,429 epoch 3 - iter 890/893 - loss 0.08208483 - time (sec): 502.52 - samples/sec: 493.86 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 03:57:54,954 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:57:54,954 EPOCH 3 done: loss 0.0822 - lr: 0.000125 |
|
2023-10-11 03:58:16,128 DEV : loss 0.09909958392381668 - f1-score (micro avg) 0.7795 |
|
2023-10-11 03:58:16,162 saving best model |
|
2023-10-11 03:58:18,810 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 03:59:10,515 epoch 4 - iter 89/893 - loss 0.05729177 - time (sec): 51.70 - samples/sec: 478.76 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 04:00:01,270 epoch 4 - iter 178/893 - loss 0.05132805 - time (sec): 102.46 - samples/sec: 466.91 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 04:00:53,652 epoch 4 - iter 267/893 - loss 0.05168862 - time (sec): 154.84 - samples/sec: 477.37 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 04:01:46,002 epoch 4 - iter 356/893 - loss 0.05215912 - time (sec): 207.19 - samples/sec: 485.67 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 04:02:34,693 epoch 4 - iter 445/893 - loss 0.05079508 - time (sec): 255.88 - samples/sec: 483.73 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-11 04:03:24,842 epoch 4 - iter 534/893 - loss 0.05130903 - time (sec): 306.03 - samples/sec: 485.64 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 04:04:15,425 epoch 4 - iter 623/893 - loss 0.05198355 - time (sec): 356.61 - samples/sec: 489.38 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 04:05:05,416 epoch 4 - iter 712/893 - loss 0.05235379 - time (sec): 406.60 - samples/sec: 488.86 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 04:05:55,152 epoch 4 - iter 801/893 - loss 0.05215177 - time (sec): 456.34 - samples/sec: 488.91 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 04:06:45,629 epoch 4 - iter 890/893 - loss 0.05242878 - time (sec): 506.82 - samples/sec: 489.86 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 04:06:47,035 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:06:47,035 EPOCH 4 done: loss 0.0523 - lr: 0.000107 |
|
2023-10-11 04:07:09,433 DEV : loss 0.11436811834573746 - f1-score (micro avg) 0.7913 |
|
2023-10-11 04:07:09,464 saving best model |
|
2023-10-11 04:07:12,074 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:08:03,836 epoch 5 - iter 89/893 - loss 0.04449798 - time (sec): 51.76 - samples/sec: 480.80 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 04:08:54,019 epoch 5 - iter 178/893 - loss 0.04209674 - time (sec): 101.94 - samples/sec: 466.28 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 04:09:45,794 epoch 5 - iter 267/893 - loss 0.04174679 - time (sec): 153.72 - samples/sec: 470.86 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-11 04:10:36,630 epoch 5 - iter 356/893 - loss 0.04092342 - time (sec): 204.55 - samples/sec: 475.82 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 04:11:25,166 epoch 5 - iter 445/893 - loss 0.04167428 - time (sec): 253.09 - samples/sec: 479.16 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 04:12:14,988 epoch 5 - iter 534/893 - loss 0.04028609 - time (sec): 302.91 - samples/sec: 482.87 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-11 04:13:04,569 epoch 5 - iter 623/893 - loss 0.04068915 - time (sec): 352.49 - samples/sec: 490.07 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-11 04:13:54,291 epoch 5 - iter 712/893 - loss 0.04043929 - time (sec): 402.21 - samples/sec: 491.30 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 04:14:47,153 epoch 5 - iter 801/893 - loss 0.03903387 - time (sec): 455.07 - samples/sec: 488.39 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 04:15:37,354 epoch 5 - iter 890/893 - loss 0.03924486 - time (sec): 505.28 - samples/sec: 490.96 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 04:15:38,821 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:15:38,821 EPOCH 5 done: loss 0.0392 - lr: 0.000089 |
|
2023-10-11 04:16:00,816 DEV : loss 0.1378975659608841 - f1-score (micro avg) 0.7957 |
|
2023-10-11 04:16:00,847 saving best model |
|
2023-10-11 04:16:03,500 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:16:53,271 epoch 6 - iter 89/893 - loss 0.02839597 - time (sec): 49.77 - samples/sec: 497.48 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 04:17:43,066 epoch 6 - iter 178/893 - loss 0.02969544 - time (sec): 99.56 - samples/sec: 500.48 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 04:18:32,285 epoch 6 - iter 267/893 - loss 0.02762991 - time (sec): 148.78 - samples/sec: 500.47 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 04:19:21,556 epoch 6 - iter 356/893 - loss 0.02793419 - time (sec): 198.05 - samples/sec: 501.04 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 04:20:11,648 epoch 6 - iter 445/893 - loss 0.02784170 - time (sec): 248.14 - samples/sec: 500.36 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 04:21:02,540 epoch 6 - iter 534/893 - loss 0.02833000 - time (sec): 299.04 - samples/sec: 502.29 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 04:21:51,592 epoch 6 - iter 623/893 - loss 0.02781878 - time (sec): 348.09 - samples/sec: 501.02 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 04:22:41,493 epoch 6 - iter 712/893 - loss 0.02807748 - time (sec): 397.99 - samples/sec: 501.31 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 04:23:32,397 epoch 6 - iter 801/893 - loss 0.02889368 - time (sec): 448.89 - samples/sec: 501.88 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 04:24:20,319 epoch 6 - iter 890/893 - loss 0.02940820 - time (sec): 496.81 - samples/sec: 499.77 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 04:24:21,612 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:24:21,612 EPOCH 6 done: loss 0.0294 - lr: 0.000071 |
|
2023-10-11 04:24:43,256 DEV : loss 0.1482623666524887 - f1-score (micro avg) 0.8011 |
|
2023-10-11 04:24:43,286 saving best model |
|
2023-10-11 04:24:45,866 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:25:37,622 epoch 7 - iter 89/893 - loss 0.02216521 - time (sec): 51.75 - samples/sec: 528.67 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 04:26:27,082 epoch 7 - iter 178/893 - loss 0.02497855 - time (sec): 101.21 - samples/sec: 513.77 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 04:27:15,549 epoch 7 - iter 267/893 - loss 0.02422271 - time (sec): 149.68 - samples/sec: 505.79 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-11 04:28:04,841 epoch 7 - iter 356/893 - loss 0.02369475 - time (sec): 198.97 - samples/sec: 502.47 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 04:28:54,382 epoch 7 - iter 445/893 - loss 0.02341763 - time (sec): 248.51 - samples/sec: 500.38 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 04:29:44,889 epoch 7 - iter 534/893 - loss 0.02214307 - time (sec): 299.02 - samples/sec: 501.17 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 04:30:33,572 epoch 7 - iter 623/893 - loss 0.02180161 - time (sec): 347.70 - samples/sec: 499.52 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 04:31:25,741 epoch 7 - iter 712/893 - loss 0.02147087 - time (sec): 399.87 - samples/sec: 500.34 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 04:32:17,029 epoch 7 - iter 801/893 - loss 0.02193032 - time (sec): 451.16 - samples/sec: 497.92 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 04:33:08,676 epoch 7 - iter 890/893 - loss 0.02228242 - time (sec): 502.81 - samples/sec: 493.32 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-11 04:33:10,184 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:33:10,184 EPOCH 7 done: loss 0.0222 - lr: 0.000053 |
|
2023-10-11 04:33:31,045 DEV : loss 0.16061536967754364 - f1-score (micro avg) 0.7955 |
|
2023-10-11 04:33:31,076 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:34:18,933 epoch 8 - iter 89/893 - loss 0.01609981 - time (sec): 47.86 - samples/sec: 503.12 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 04:35:08,167 epoch 8 - iter 178/893 - loss 0.01581475 - time (sec): 97.09 - samples/sec: 500.45 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 04:35:58,306 epoch 8 - iter 267/893 - loss 0.01509213 - time (sec): 147.23 - samples/sec: 499.65 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 04:36:49,077 epoch 8 - iter 356/893 - loss 0.01826549 - time (sec): 198.00 - samples/sec: 501.64 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-11 04:37:38,846 epoch 8 - iter 445/893 - loss 0.01880460 - time (sec): 247.77 - samples/sec: 498.52 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 04:38:29,491 epoch 8 - iter 534/893 - loss 0.01951515 - time (sec): 298.41 - samples/sec: 497.04 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 04:39:19,883 epoch 8 - iter 623/893 - loss 0.01961892 - time (sec): 348.81 - samples/sec: 495.24 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-11 04:40:11,819 epoch 8 - iter 712/893 - loss 0.01925037 - time (sec): 400.74 - samples/sec: 495.14 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 04:41:01,099 epoch 8 - iter 801/893 - loss 0.01911293 - time (sec): 450.02 - samples/sec: 494.92 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-11 04:41:50,784 epoch 8 - iter 890/893 - loss 0.01886592 - time (sec): 499.71 - samples/sec: 495.88 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-11 04:41:52,459 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:41:52,459 EPOCH 8 done: loss 0.0189 - lr: 0.000036 |
|
2023-10-11 04:42:13,821 DEV : loss 0.17750906944274902 - f1-score (micro avg) 0.8094 |
|
2023-10-11 04:42:13,851 saving best model |
|
2023-10-11 04:42:16,513 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:43:08,166 epoch 9 - iter 89/893 - loss 0.01085198 - time (sec): 51.65 - samples/sec: 499.86 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 04:43:58,320 epoch 9 - iter 178/893 - loss 0.01500659 - time (sec): 101.80 - samples/sec: 493.47 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 04:44:49,824 epoch 9 - iter 267/893 - loss 0.01365750 - time (sec): 153.31 - samples/sec: 488.25 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 04:45:39,299 epoch 9 - iter 356/893 - loss 0.01461229 - time (sec): 202.78 - samples/sec: 489.26 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 04:46:28,453 epoch 9 - iter 445/893 - loss 0.01350951 - time (sec): 251.94 - samples/sec: 491.86 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 04:47:17,606 epoch 9 - iter 534/893 - loss 0.01395459 - time (sec): 301.09 - samples/sec: 493.39 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 04:48:05,086 epoch 9 - iter 623/893 - loss 0.01396505 - time (sec): 348.57 - samples/sec: 492.02 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 04:48:54,590 epoch 9 - iter 712/893 - loss 0.01473988 - time (sec): 398.07 - samples/sec: 494.86 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 04:49:44,244 epoch 9 - iter 801/893 - loss 0.01491773 - time (sec): 447.73 - samples/sec: 498.75 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 04:50:32,777 epoch 9 - iter 890/893 - loss 0.01457607 - time (sec): 496.26 - samples/sec: 499.51 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 04:50:34,391 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:50:34,391 EPOCH 9 done: loss 0.0146 - lr: 0.000018 |
|
2023-10-11 04:50:55,729 DEV : loss 0.18004417419433594 - f1-score (micro avg) 0.8005 |
|
2023-10-11 04:50:55,761 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:51:43,006 epoch 10 - iter 89/893 - loss 0.01256475 - time (sec): 47.24 - samples/sec: 497.36 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 04:52:33,364 epoch 10 - iter 178/893 - loss 0.01131584 - time (sec): 97.60 - samples/sec: 495.44 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 04:53:24,321 epoch 10 - iter 267/893 - loss 0.01086775 - time (sec): 148.56 - samples/sec: 497.67 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 04:54:13,799 epoch 10 - iter 356/893 - loss 0.01031103 - time (sec): 198.04 - samples/sec: 493.94 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-11 04:55:04,876 epoch 10 - iter 445/893 - loss 0.01121298 - time (sec): 249.11 - samples/sec: 497.61 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-11 04:55:56,369 epoch 10 - iter 534/893 - loss 0.01191879 - time (sec): 300.61 - samples/sec: 501.27 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 04:56:45,593 epoch 10 - iter 623/893 - loss 0.01245382 - time (sec): 349.83 - samples/sec: 496.96 - lr: 0.000006 - momentum: 0.000000 |
|
2023-10-11 04:57:34,479 epoch 10 - iter 712/893 - loss 0.01244378 - time (sec): 398.72 - samples/sec: 499.96 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-11 04:58:23,033 epoch 10 - iter 801/893 - loss 0.01246859 - time (sec): 447.27 - samples/sec: 500.35 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 04:59:10,669 epoch 10 - iter 890/893 - loss 0.01218169 - time (sec): 494.91 - samples/sec: 501.30 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 04:59:12,200 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:59:12,200 EPOCH 10 done: loss 0.0122 - lr: 0.000000 |
|
2023-10-11 04:59:34,117 DEV : loss 0.185577392578125 - f1-score (micro avg) 0.8069 |
|
2023-10-11 04:59:35,005 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 04:59:35,007 Loading model from best epoch ... |
|
2023-10-11 04:59:39,747 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-11 05:00:48,937 |
|
Results: |
|
- F-score (micro) 0.69 |
|
- F-score (macro) 0.6101 |
|
- Accuracy 0.5402 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.6901 0.6913 0.6907 1095 |
|
PER 0.7741 0.7856 0.7798 1012 |
|
ORG 0.4310 0.5602 0.4872 357 |
|
HumanProd 0.3889 0.6364 0.4828 33 |
|
|
|
micro avg 0.6711 0.7101 0.6900 2497 |
|
macro avg 0.5710 0.6684 0.6101 2497 |
|
weighted avg 0.6831 0.7101 0.6950 2497 |
|
|
|
2023-10-11 05:00:48,937 ---------------------------------------------------------------------------------------------------- |
|
|