2024-09-05 13:34:48.258223: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-09-05 13:34:48.275126: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-05 13:34:48.295825: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-05 13:34:48.302058: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-05 13:34:48.316617: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-05 13:34:49.554072: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( 09/05/2024 13:34:51 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 09/05/2024 13:34:51 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, batch_eval_metrics=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=epoch, eval_use_gather_object=False, evaluation_strategy=epoch, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=True, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/content/dissertation/scripts/ner/output/tb, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=f1, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=10.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/content/dissertation/scripts/ner/output, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=/content/dissertation/scripts/ner/output, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=500, save_strategy=epoch, save_total_limit=None, seed=42, skip_memory_metrics=True, split_batches=None, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) Downloading builder script: 0%| | 0.00/3.61k [00:00> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/config.json [INFO|configuration_utils.py:800] 2024-09-05 13:35:02,050 >> Model config BertConfig { "_name_or_path": "IVN-RIN/bioBIT", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "ner", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "O", "1": "B-FARMACO", "2": "I-FARMACO" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "B-FARMACO": 1, "I-FARMACO": 2, "O": 0 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "torch_dtype": "float32", "transformers_version": "4.44.2", "type_vocab_size": 2, "use_cache": true, "vocab_size": 31102 } [INFO|tokenization_utils_base.py:2269] 2024-09-05 13:35:02,109 >> loading file vocab.txt from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/vocab.txt [INFO|tokenization_utils_base.py:2269] 2024-09-05 13:35:02,109 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/tokenizer.json [INFO|tokenization_utils_base.py:2269] 2024-09-05 13:35:02,109 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2269] 2024-09-05 13:35:02,109 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/special_tokens_map.json [INFO|tokenization_utils_base.py:2269] 2024-09-05 13:35:02,109 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/tokenizer_config.json /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( [INFO|modeling_utils.py:3678] 2024-09-05 13:35:02,174 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--IVN-RIN--bioBIT/snapshots/83755ed79ee254c11854e9f54a53679557271018/model.safetensors [INFO|modeling_utils.py:4497] 2024-09-05 13:35:02,231 >> Some weights of the model checkpoint at IVN-RIN/bioBIT were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:4509] 2024-09-05 13:35:02,231 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at IVN-RIN/bioBIT and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Map: 0%| | 0/30642 [00:00> The following columns in the training set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, id, tokens. If ner_tags, id, tokens are not expected by `BertForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2134] 2024-09-05 13:35:09,528 >> ***** Running training ***** [INFO|trainer.py:2135] 2024-09-05 13:35:09,528 >> Num examples = 30,642 [INFO|trainer.py:2136] 2024-09-05 13:35:09,528 >> Num Epochs = 10 [INFO|trainer.py:2137] 2024-09-05 13:35:09,528 >> Instantaneous batch size per device = 32 [INFO|trainer.py:2140] 2024-09-05 13:35:09,528 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:2141] 2024-09-05 13:35:09,528 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2142] 2024-09-05 13:35:09,528 >> Total optimization steps = 4,790 [INFO|trainer.py:2143] 2024-09-05 13:35:09,529 >> Number of trainable parameters = 109,339,395 0%| | 0/4790 [00:00> The following columns in the evaluation set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, id, tokens. If ner_tags, id, tokens are not expected by `BertForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:3819] 2024-09-05 13:37:19,927 >> ***** Running Evaluation ***** [INFO|trainer.py:3821] 2024-09-05 13:37:19,927 >> Num examples = 6798 [INFO|trainer.py:3824] 2024-09-05 13:37:19,927 >> Batch size = 8 0%| | 0/850 [00:00> Saving model checkpoint to /content/dissertation/scripts/ner/output/checkpoint-479 [INFO|configuration_utils.py:472] 2024-09-05 13:37:33,957 >> Configuration saved in /content/dissertation/scripts/ner/output/checkpoint-479/config.json [INFO|modeling_utils.py:2799] 2024-09-05 13:37:34,851 >> Model weights saved in /content/dissertation/scripts/ner/output/checkpoint-479/model.safetensors [INFO|tokenization_utils_base.py:2684] 2024-09-05 13:37:34,852 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/checkpoint-479/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-09-05 13:37:34,852 >> Special tokens file saved in /content/dissertation/scripts/ner/output/checkpoint-479/special_tokens_map.json [INFO|tokenization_utils_base.py:2684] 2024-09-05 13:37:36,650 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-09-05 13:37:36,650 >> Special tokens file saved in /content/dissertation/scripts/ner/output/special_tokens_map.json 10%|█ | 480/4790 [02:27<6:21:25, 5.31s/it] 10%|█ | 481/4790 [02:27<4:31:30, 3.78s/it] 10%|█ | 482/4790 [02:27<3:16:31, 2.74s/it] 10%|█ | 483/4790 [02:28<2:25:49, 2.03s/it] 10%|█ | 484/4790 [02:28<1:46:37, 1.49s/it] 10%|█ | 485/4790 [02:28<1:22:39, 1.15s/it] 10%|█ | 486/4790 [02:29<1:02:21, 1.15it/s] 10%|█ | 487/4790 [02:29<50:31, 1.42it/s] 10%|█ | 488/4790 [02:29<41:55, 1.71it/s] 10%|█ | 489/4790 [02:29<33:13, 2.16it/s] 10%|█ | 490/4790 [02:30<29:22, 2.44it/s] 10%|█ | 491/4790 [02:30<24:58, 2.87it/s] 10%|█ | 492/4790 [02:30<22:15, 3.22it/s] 10%|█ | 493/4790 [02:30<20:11, 3.55it/s] 10%|█ | 494/4790 [02:31<20:24, 3.51it/s] 10%|█ | 495/4790 [02:31<19:16, 3.71it/s] 10%|█ | 496/4790 [02:31<19:33, 3.66it/s] 10%|█ | 497/4790 [02:32<21:30, 3.33it/s] 10%|█ | 498/4790 [02:32<22:35, 3.17it/s] 10%|█ | 499/4790 [02:32<20:38, 3.46it/s] 10%|█ | 500/4790 [02:32<19:03, 3.75it/s] 10%|█ | 500/4790 [02:32<19:03, 3.75it/s] 10%|█ | 501/4790 [02:33<19:03, 3.75it/s] 10%|█ | 502/4790 [02:33<19:37, 3.64it/s] 11%|█ | 503/4790 [02:33<18:13, 3.92it/s] 11%|█ | 504/4790 [02:34<21:59, 3.25it/s] 11%|█ | 505/4790 [02:34<20:52, 3.42it/s] 11%|█ | 506/4790 [02:34<20:21, 3.51it/s] 11%|█ | 507/4790 [02:34<18:42, 3.82it/s] 11%|█ | 508/4790 [02:35<22:26, 3.18it/s] 11%|█ | 509/4790 [02:35<22:30, 3.17it/s] 11%|█ | 510/4790 [02:35<21:05, 3.38it/s] 11%|█ | 511/4790 [02:36<19:19, 3.69it/s] 11%|█ | 512/4790 [02:36<19:44, 3.61it/s] 11%|█ | 513/4790 [02:36<20:22, 3.50it/s] 11%|█ | 514/4790 [02:36<19:52, 3.59it/s] 11%|█ | 515/4790 [02:37<21:09, 3.37it/s] 11%|█ | 516/4790 [02:37<20:38, 3.45it/s] 11%|█ | 517/4790 [02:37<20:57, 3.40it/s] 11%|█ | 518/4790 [02:38<19:47, 3.60it/s] 11%|█ | 519/4790 [02:38<18:21, 3.88it/s]