2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:45:55 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:45:56 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                                                       | 0/3 [00:00<?, ?it/s]
2025-01-26 23:45:56 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:45:56 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:45:58 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:45:58 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:45:58 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:45:58 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:45:58 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:45:58 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:45:58 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:45:58 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:45:59 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                                                       | 0/3 [00:00<?, ?it/s]
2025-01-26 23:45:59 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:45:59 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:46:05 - ERROR - stderr - Loading checkpoint shards:  33%|███████████████████████████████▋                                                               | 1/3 [00:08<00:17,  8.91s/it]
2025-01-26 23:46:07 - ERROR - stderr - Loading checkpoint shards:  33%|███████████████████████████████▋                                                               | 1/3 [00:07<00:15,  7.62s/it]
2025-01-26 23:46:14 - ERROR - stderr - Loading checkpoint shards:  67%|███████████████████████████████████████████████████████████████▎                               | 2/3 [00:18<00:09,  9.05s/it]
2025-01-26 23:46:14 - ERROR - stderr - Loading checkpoint shards:  67%|███████████████████████████████████████████████████████████████▎                               | 2/3 [00:15<00:07,  7.73s/it]
2025-01-26 23:46:22 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:25<00:00,  8.35s/it]
2025-01-26 23:46:22 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:25<00:00,  8.52s/it]
2025-01-26 23:46:22 - ERROR - stderr - 
2025-01-26 23:46:22 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:46:22 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:46:22 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:46:22 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json.
2025-01-26 23:46:22 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:46:22 - INFO - stdout - Before filtering, the dataset size is: 6.
2025-01-26 23:46:22 - INFO - stdout - After filtering, the dataset size is: 6.
2025-01-26 23:46:22 - INFO - stdout - Number of simple_description: 6
2025-01-26 23:46:22 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:46:22 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:46:22 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:46:22 - ERROR - stderr -   warnings.warn(
2025-01-26 23:46:22 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.63s/it]
2025-01-26 23:46:22 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.65s/it]
2025-01-26 23:46:22 - ERROR - stderr - 
2025-01-26 23:46:22 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:46:22 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:46:22 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:46:22 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json.
2025-01-26 23:46:22 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:46:22 - INFO - stdout - Before filtering, the dataset size is: 6.
2025-01-26 23:46:22 - INFO - stdout - After filtering, the dataset size is: 6.
2025-01-26 23:46:22 - INFO - stdout - Number of simple_description: 6
2025-01-26 23:46:22 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:46:22 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:46:22 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:46:22 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:46:22 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:46:22 - ERROR - stderr -   warnings.warn(
2025-01-26 23:46:25 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-26 23:46:25 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-26 23:46:28 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:46:28 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:46:28 - INFO - transformers.trainer -   Num examples = 6
2025-01-26 23:46:28 - INFO - transformers.trainer -   Num examples = 6
2025-01-26 23:46:28 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:46:28 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:46:28 - INFO - transformers.trainer -   Instantaneous batch size per device = 1
2025-01-26 23:46:28 - INFO - transformers.trainer -   Instantaneous batch size per device = 1
2025-01-26 23:46:28 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 128
2025-01-26 23:46:28 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 128
2025-01-26 23:46:28 - INFO - transformers.trainer -   Gradient Accumulation steps = 64
2025-01-26 23:46:28 - INFO - transformers.trainer -   Gradient Accumulation steps = 64
2025-01-26 23:46:28 - INFO - transformers.trainer -   Total optimization steps = 1
2025-01-26 23:46:28 - INFO - transformers.trainer -   Total optimization steps = 1
2025-01-26 23:46:28 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:46:28 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:46:28 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:46:28 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:46:33 - ERROR - stderr - [34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
2025-01-26 23:46:33 - INFO - wandb - Current SDK version is 0.19.4
2025-01-26 23:46:33 - INFO - wandb - Configure stats pid to 42181
2025-01-26 23:46:33 - INFO - wandb - Loading settings from /root/.config/wandb/settings
2025-01-26 23:46:33 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings
2025-01-26 23:46:33 - INFO - wandb - Loading settings from environment variables
2025-01-26 23:46:33 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_234633-qw4mkw5m/logs/debug.log
2025-01-26 23:46:33 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_234633-qw4mkw5m/logs/debug-internal.log
2025-01-26 23:46:33 - INFO - wandb - calling init triggers
2025-01-26 23:46:33 - INFO - wandb - wandb.init called with sweep_config: {}
config: {}
2025-01-26 23:46:33 - INFO - wandb - starting backend
2025-01-26 23:46:33 - ERROR - stderr - [34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
2025-01-26 23:46:33 - INFO - wandb - sending inform_init request
2025-01-26 23:46:33 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings:
2025-01-26 23:46:33 - ERROR - stderr -   Expected `list[str]` but got `tuple` - serialized value may not be as expected
2025-01-26 23:46:33 - ERROR - stderr -   return self.__pydantic_serializer__.to_python(
2025-01-26 23:46:33 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2025-01-26 23:46:33 - INFO - wandb - backend started and connected
2025-01-26 23:46:33 - DEBUG - wandb - no default config file found in config-defaults.yaml
2025-01-26 23:46:33 - INFO - wandb - updated telemetry
2025-01-26 23:46:33 - INFO - wandb - communicating run to backend with 90.0 second timeout
2025-01-26 23:46:34 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:46:35 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:46:36 - ERROR - stderr - [34m[1mwandb[0m: | Waiting for wandb.init()...
2025-01-26 23:46:37 - ERROR - stderr - [34m[1mwandb[0m: / Waiting for wandb.init()...
2025-01-26 23:46:38 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:46:39 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:46:40 - ERROR - stderr - [34m[1mwandb[0m: | Waiting for wandb.init()...
2025-01-26 23:46:41 - ERROR - stderr - [34m[1mwandb[0m: / Waiting for wandb.init()...
2025-01-26 23:46:42 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:46:43 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:46:44 - ERROR - stderr - [34m[1mwandb[0m: | Waiting for wandb.init()...
2025-01-26 23:46:45 - ERROR - stderr - [34m[1mwandb[0m: / Waiting for wandb.init()...
2025-01-26 23:46:46 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:46:47 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:46:48 - ERROR - stderr - [34m[1mwandb[0m: | Waiting for wandb.init()...
2025-01-26 23:46:49 - WARNING - wandb - interrupted
Traceback (most recent call last):
  File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1444, in init
    return wi.init(run_settings, run_config)
  File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 920, in init
    result = run_init_handle.wait(
  File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 279, in wait
    found, abandoned = self._slot._get_and_clear(timeout=wait_timeout)
  File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 126, in _get_and_clear
    if self._wait(timeout=timeout):
  File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 122, in _wait
    return self._event.wait(timeout=timeout)
  File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 607, in wait
    signaled = self._cond.wait(timeout)
  File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 324, in wait
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt
2025-01-26 23:46:49 - ERROR - stderr - Traceback (most recent call last):
2025-01-26 23:46:49 - ERROR - stderr -   File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in <module>
2025-01-26 23:46:49 - ERROR - stderr -     train()
2025-01-26 23:46:49 - ERROR - stderr -   File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train
2025-01-26 23:46:49 - ERROR - stderr -     trainer.train()
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train
2025-01-26 23:46:49 - ERROR - stderr -     return inner_training_loop(
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1827, in _inner_training_loop
2025-01-26 23:46:49 - ERROR - stderr -     self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin
2025-01-26 23:46:49 - ERROR - stderr -     return self.call_event("on_train_begin", args, state, control)
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event
2025-01-26 23:46:49 - ERROR - stderr -     result = getattr(callback, event)(
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 753, in on_train_begin
2025-01-26 23:46:49 - ERROR - stderr -     self.setup(args, state, model, **kwargs)
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 727, in setup
2025-01-26 23:46:49 - ERROR - stderr -     self._wandb.init(
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1444, in init
2025-01-26 23:46:49 - ERROR - stderr -     return wi.init(run_settings, run_config)
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 920, in init
2025-01-26 23:46:49 - ERROR - stderr -     result = run_init_handle.wait(
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 279, in wait
2025-01-26 23:46:49 - ERROR - stderr -     found, abandoned = self._slot._get_and_clear(timeout=wait_timeout)
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 126, in _get_and_clear
2025-01-26 23:46:49 - ERROR - stderr -     if self._wait(timeout=timeout):
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 122, in _wait
2025-01-26 23:46:49 - ERROR - stderr -     return self._event.wait(timeout=timeout)
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 607, in wait
2025-01-26 23:46:49 - ERROR - stderr -     signaled = self._cond.wait(timeout)
2025-01-26 23:46:49 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 324, in wait
2025-01-26 23:46:49 - ERROR - stderr -     gotit = waiter.acquire(True, timeout)
2025-01-26 23:46:49 - ERROR - stderr - KeyboardInterrupt
2025-01-26 23:46:49 - ERROR - stderr - [rank0]: Traceback (most recent call last):
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in <module>
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     train()
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     trainer.train()
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     return inner_training_loop(
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1827, in _inner_training_loop
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     return self.call_event("on_train_begin", args, state, control)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     result = getattr(callback, event)(
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 753, in on_train_begin
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     self.setup(args, state, model, **kwargs)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 727, in setup
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     self._wandb.init(
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1444, in init
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     return wi.init(run_settings, run_config)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 920, in init
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     result = run_init_handle.wait(
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 279, in wait
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     found, abandoned = self._slot._get_and_clear(timeout=wait_timeout)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 126, in _get_and_clear
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     if self._wait(timeout=timeout):
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 122, in _wait
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     return self._event.wait(timeout=timeout)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 607, in wait
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     signaled = self._cond.wait(timeout)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 324, in wait
2025-01-26 23:46:49 - ERROR - stderr - [rank0]:     gotit = waiter.acquire(True, timeout)
2025-01-26 23:46:49 - ERROR - stderr - [rank0]: KeyboardInterrupt
2025-01-26 23:46:49 - WARNING - wandb - message_loop has been closed
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:48:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:48:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:48:52 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                    | 0/3 [00:00<?, ?it/s]
2025-01-26 23:48:52 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:48:52 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:48:52 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                    | 0/3 [00:00<?, ?it/s]
2025-01-26 23:48:52 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:48:52 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:48:59 - ERROR - stderr - Loading checkpoint shards:  33%|████████████████████                                        | 1/3 [00:07<00:14,  7.36s/it]
2025-01-26 23:49:00 - ERROR - stderr - Loading checkpoint shards:  33%|████████████████████                                        | 1/3 [00:08<00:16,  8.02s/it]
2025-01-26 23:49:07 - ERROR - stderr - Loading checkpoint shards:  67%|████████████████████████████████████████                    | 2/3 [00:14<00:07,  7.29s/it]
2025-01-26 23:49:07 - ERROR - stderr - Loading checkpoint shards:  67%|████████████████████████████████████████                    | 2/3 [00:15<00:07,  7.69s/it]
2025-01-26 23:49:12 - ERROR - stderr - Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:19<00:00,  6.40s/it]
2025-01-26 23:49:12 - ERROR - stderr - Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:19<00:00,  6.65s/it]
2025-01-26 23:49:12 - ERROR - stderr - 
2025-01-26 23:49:12 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:49:12 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:49:12 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:49:12 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json.
2025-01-26 23:49:12 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:49:12 - INFO - stdout - Before filtering, the dataset size is: 6.
2025-01-26 23:49:12 - INFO - stdout - After filtering, the dataset size is: 6.
2025-01-26 23:49:12 - INFO - stdout - Number of simple_description: 6
2025-01-26 23:49:12 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:49:12 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:49:12 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:49:12 - ERROR - stderr -   warnings.warn(
2025-01-26 23:49:12 - ERROR - stderr - Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:20<00:00,  6.57s/it]
2025-01-26 23:49:12 - ERROR - stderr - Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:20<00:00,  6.90s/it]
2025-01-26 23:49:12 - ERROR - stderr - 
2025-01-26 23:49:12 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:49:13 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:49:13 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:49:13 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json.
2025-01-26 23:49:13 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:49:13 - INFO - stdout - Before filtering, the dataset size is: 6.
2025-01-26 23:49:13 - INFO - stdout - After filtering, the dataset size is: 6.
2025-01-26 23:49:13 - INFO - stdout - Number of simple_description: 6
2025-01-26 23:49:13 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:49:13 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:49:13 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:49:13 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:49:13 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:49:13 - ERROR - stderr -   warnings.warn(
2025-01-26 23:49:19 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-26 23:49:19 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-26 23:49:20 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:49:20 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:49:20 - INFO - transformers.trainer -   Num examples = 6
2025-01-26 23:49:20 - INFO - transformers.trainer -   Num examples = 6
2025-01-26 23:49:20 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:49:20 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:49:20 - INFO - transformers.trainer -   Instantaneous batch size per device = 1
2025-01-26 23:49:20 - INFO - transformers.trainer -   Instantaneous batch size per device = 1
2025-01-26 23:49:20 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 128
2025-01-26 23:49:20 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 128
2025-01-26 23:49:20 - INFO - transformers.trainer -   Gradient Accumulation steps = 64
2025-01-26 23:49:20 - INFO - transformers.trainer -   Gradient Accumulation steps = 64
2025-01-26 23:49:20 - INFO - transformers.trainer -   Total optimization steps = 1
2025-01-26 23:49:20 - INFO - transformers.trainer -   Total optimization steps = 1
2025-01-26 23:49:20 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:49:20 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:49:20 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:49:20 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:49:22 - ERROR - stderr - [34m[1mwandb[0m: Currently logged in as: [33m1282467298[0m ([33m1282467298-university-of-nottingham-ningbo-china[0m). Use [1m`wandb login --relogin`[0m to force relogin
2025-01-26 23:49:22 - INFO - wandb - Current SDK version is 0.19.4
2025-01-26 23:49:22 - INFO - wandb - Configure stats pid to 44018
2025-01-26 23:49:22 - INFO - wandb - Loading settings from /root/.config/wandb/settings
2025-01-26 23:49:22 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings
2025-01-26 23:49:22 - INFO - wandb - Loading settings from environment variables
2025-01-26 23:49:22 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_234922-i2eep5vt/logs/debug.log
2025-01-26 23:49:22 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_234922-i2eep5vt/logs/debug-internal.log
2025-01-26 23:49:22 - INFO - wandb - calling init triggers
2025-01-26 23:49:22 - INFO - wandb - wandb.init called with sweep_config: {}
config: {}
2025-01-26 23:49:22 - INFO - wandb - starting backend
2025-01-26 23:49:22 - ERROR - stderr - [34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
2025-01-26 23:49:22 - INFO - wandb - sending inform_init request
2025-01-26 23:49:22 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings:
2025-01-26 23:49:22 - ERROR - stderr -   Expected `list[str]` but got `tuple` - serialized value may not be as expected
2025-01-26 23:49:22 - ERROR - stderr -   return self.__pydantic_serializer__.to_python(
2025-01-26 23:49:22 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2025-01-26 23:49:22 - INFO - wandb - backend started and connected
2025-01-26 23:49:22 - DEBUG - wandb - no default config file found in config-defaults.yaml
2025-01-26 23:49:22 - INFO - wandb - updated telemetry
2025-01-26 23:49:22 - INFO - wandb - communicating run to backend with 90.0 second timeout
2025-01-26 23:49:23 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:49:24 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:49:25 - ERROR - stderr - [34m[1mwandb[0m: | Waiting for wandb.init()...
2025-01-26 23:49:26 - ERROR - stderr - [34m[1mwandb[0m: / Waiting for wandb.init()...
2025-01-26 23:49:27 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:49:28 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:49:29 - ERROR - stderr - [34m[1mwandb[0m: | Waiting for wandb.init()...
2025-01-26 23:49:30 - ERROR - stderr - [34m[1mwandb[0m: / Waiting for wandb.init()...
2025-01-26 23:49:31 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:49:32 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:49:33 - ERROR - stderr - [34m[1mwandb[0m: | Waiting for wandb.init()...
2025-01-26 23:49:34 - ERROR - stderr - [34m[1mwandb[0m: / Waiting for wandb.init()...
2025-01-26 23:49:35 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:49:35 - INFO - wandb - starting run threads in backend
2025-01-26 23:49:35 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings:
2025-01-26 23:49:35 - ERROR - stderr -   Expected `list[str]` but got `tuple` - serialized value may not be as expected
2025-01-26 23:49:35 - ERROR - stderr -   return self.__pydantic_serializer__.to_python(
2025-01-26 23:49:35 - ERROR - stderr - [34m[1mwandb[0m: Tracking run with wandb version 0.19.4
2025-01-26 23:49:35 - ERROR - stderr - [34m[1mwandb[0m: Run data is saved locally in [35m[1m/code/syr/PointLLM/wandb/run-20250126_234922-i2eep5vt[0m
2025-01-26 23:49:35 - ERROR - stderr - [34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
2025-01-26 23:49:35 - ERROR - stderr - [34m[1mwandb[0m: Syncing run [33mPointLLM_train_stage2[0m
2025-01-26 23:49:35 - ERROR - stderr - [34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface[0m
2025-01-26 23:49:35 - ERROR - stderr - [34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface/runs/i2eep5vt[0m
2025-01-26 23:49:35 - DEBUG - wandb - Saving list of pip packages installed into the current environment
2025-01-26 23:49:35 - INFO - wandb - atexit reg
2025-01-26 23:49:35 - INFO - wandb - redirect: wrap_raw
2025-01-26 23:49:35 - INFO - wandb - Wrapping output streams.
2025-01-26 23:49:35 - INFO - wandb - Redirects installed.
2025-01-26 23:49:35 - INFO - wandb - run started, returning control to user process
2025-01-26 23:49:35 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'chunk_size_feed_forward': 0, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['PointLLMLlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2', 'transformers_version': '4.28.0.dev0', 'DEFAULT_POINT_END_TOKEN': '<point_end>', 'DEFAULT_POINT_PATCH_TOKEN': '<point_patch>', 'DEFAULT_POINT_START_TOKEN': '<point_start>', 'max_position_embeddings': 2048, 'mm_use_point_start_end': True, 'model_type': 'pointllm', 'point_backbone': 'PointBERT', 'point_backbone_ckpt': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_bert_v1.2.pt', 'point_backbone_config_name': 'PointTransformer_8192point_2layer', 'use_color': True, 'output_dir': 'outputs/PointLLM_train_stage2/test_stage2', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'evaluation_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': 'None', 'per_gpu_eval_batch_size': 'None', 'gradient_accumulation_steps': 64, 'eval_accumulation_steps': 'None', 'eval_delay': 0, 'learning_rate': 3e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 1, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/PointLLM_train_stage2/test_stage2/runs/Jan26_23-48-13_audio-73426-task1-0', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 300, 'save_total_limit': 1, 'save_on_each_node': False, 'no_cuda': False, 'use_mps_device': False, 'seed': 42, 'data_seed': 'None', 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'cuda_amp', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': 'None', 'local_rank': 0, 'xpu_backend': 'None', 'tpu_num_cores': 'None', 'tpu_metrics_debug': False, 'debug': '[]', 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'past_index': -1, 'run_name': 'PointLLM_train_stage2', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': 'None', 'load_best_model_at_end': False, 'metric_for_best_model': 'None', 'greater_is_better': 'None', 'ignore_data_skip': False, 'sharded_ddp': '[]', 'fsdp': "['full_shard', 'auto_wrap']", 'fsdp_min_num_params': 0, 'fsdp_config': "{'fsdp_min_num_params': 0, 'fsdp_transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_grad_ckpt': False}", 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'deepspeed': 'None', 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': 'None', 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': "['wandb']", 'ddp_find_unused_parameters': 'None', 'ddp_bucket_cap_mb': 'None', 'dataloader_pin_memory': True, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': 'None', 'hub_model_id': 'None', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'gradient_checkpointing': True, 'include_inputs_for_metrics': False, 'fp16_backend': 'auto', 'push_to_hub_model_id': 'None', 'push_to_hub_organization': 'None', 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': 'None', 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': 'None', 'torch_compile_mode': 'None', 'cache_dir': '/code/syr/PointLLM/cache_dir', 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'fix_pointnet': True, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_proj.bin', 'detatch_point_token': '<DETATCH_POINT_TOKEN>', 'train_batch_size': 1, 'eval_batch_size': 1}
2025-01-26 23:49:35 - ERROR - stderr -   0%|                                                                                               | 0/1 [00:00<?, ?it/s]
2025-01-26 23:49:35 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-26 23:49:35 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-26 23:49:37 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
2025-01-26 23:49:37 - ERROR - stderr -   return fn(*args, **kwargs)
2025-01-26 23:49:37 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
2025-01-26 23:49:37 - ERROR - stderr -   return fn(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - Traceback (most recent call last):
2025-01-26 23:49:44 - ERROR - stderr -   File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in <module>
2025-01-26 23:49:44 - ERROR - stderr -     train()
2025-01-26 23:49:44 - ERROR - stderr -   File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train
2025-01-26 23:49:44 - ERROR - stderr -     trainer.train()
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train
2025-01-26 23:49:44 - ERROR - stderr -     return inner_training_loop(
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1978, in _inner_training_loop
2025-01-26 23:49:44 - ERROR - stderr -     self.optimizer.step()
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
2025-01-26 23:49:44 - ERROR - stderr -     return func.__get__(opt, opt.__class__)(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper
2025-01-26 23:49:44 - ERROR - stderr -     out = func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 91, in _use_grad
2025-01-26 23:49:44 - ERROR - stderr -     ret = func(self, *args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 220, in step
2025-01-26 23:49:44 - ERROR - stderr -     adamw(
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
2025-01-26 23:49:44 - ERROR - stderr -     return func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 782, in adamw
2025-01-26 23:49:44 - ERROR - stderr -     func(
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
2025-01-26 23:49:44 - ERROR - stderr -     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
2025-01-26 23:49:44 - ERROR - stderr -     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-01-26 23:49:44 - ERROR - stderr -     return func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr -   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
2025-01-26 23:49:44 - ERROR - stderr -     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
2025-01-26 23:49:44 - ERROR - stderr - RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
2025-01-26 23:49:44 - ERROR - stderr - [rank0]: Traceback (most recent call last):
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in <module>
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     train()
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     trainer.train()
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     return inner_training_loop(
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1978, in _inner_training_loop
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     self.optimizer.step()
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     out = func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 91, in _use_grad
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     ret = func(self, *args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 220, in step
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     adamw(
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     return func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 782, in adamw
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     func(
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     return func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
2025-01-26 23:49:44 - ERROR - stderr - [rank0]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
2025-01-26 23:49:44 - ERROR - stderr - [rank0]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
2025-01-26 23:49:44 - WARNING - wandb - message_loop has been closed
2025-01-26 23:49:44 - ERROR - stderr - [rank1]: Traceback (most recent call last):
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in <module>
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     train()
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     trainer.train()
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     return inner_training_loop(
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1978, in _inner_training_loop
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     self.optimizer.step()
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     return func.__get__(opt, opt.__class__)(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     out = func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 91, in _use_grad
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     ret = func(self, *args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 220, in step
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     adamw(
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 154, in maybe_fallback
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     return func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 782, in adamw
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     func(
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)  # type: ignore[return-value, arg-type]
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     return func(*args, **kwargs)
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:   File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype
2025-01-26 23:49:44 - ERROR - stderr - [rank1]:     return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
2025-01-26 23:49:44 - ERROR - stderr - [rank1]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding
2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:52:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:52:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:52:52 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                       | 0/3 [00:00<?, ?it/s]
2025-01-26 23:52:52 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:52:52 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:52:52 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                       | 0/3 [00:00<?, ?it/s]
2025-01-26 23:52:52 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:52:52 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:53:00 - ERROR - stderr - Loading checkpoint shards:  33%|█████████████████████                                          | 1/3 [00:07<00:15,  7.55s/it]
2025-01-26 23:53:00 - ERROR - stderr - Loading checkpoint shards:  33%|█████████████████████                                          | 1/3 [00:07<00:15,  7.58s/it]
2025-01-26 23:53:06 - ERROR - stderr - Loading checkpoint shards:  67%|██████████████████████████████████████████                     | 2/3 [00:13<00:06,  6.83s/it]
2025-01-26 23:53:07 - ERROR - stderr - Loading checkpoint shards:  67%|██████████████████████████████████████████                     | 2/3 [00:14<00:07,  7.32s/it]
2025-01-26 23:53:11 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:18<00:00,  5.99s/it]
2025-01-26 23:53:11 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:18<00:00,  6.29s/it]
2025-01-26 23:53:11 - ERROR - stderr - 
2025-01-26 23:53:11 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:53:11 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:53:11 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:53:11 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json.
2025-01-26 23:53:11 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:53:11 - INFO - stdout - Before filtering, the dataset size is: 6.
2025-01-26 23:53:11 - INFO - stdout - After filtering, the dataset size is: 6.
2025-01-26 23:53:11 - INFO - stdout - Number of simple_description: 6
2025-01-26 23:53:11 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:53:11 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:53:11 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:53:11 - ERROR - stderr -   warnings.warn(
2025-01-26 23:53:13 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:20<00:00,  6.52s/it]
2025-01-26 23:53:13 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:20<00:00,  6.76s/it]
2025-01-26 23:53:13 - ERROR - stderr - 
2025-01-26 23:53:13 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:53:13 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:53:13 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:53:13 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json.
2025-01-26 23:53:13 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:53:13 - INFO - stdout - Before filtering, the dataset size is: 6.
2025-01-26 23:53:13 - INFO - stdout - After filtering, the dataset size is: 6.
2025-01-26 23:53:13 - INFO - stdout - Number of simple_description: 6
2025-01-26 23:53:13 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:53:13 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:53:13 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:53:13 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:53:13 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:53:13 - ERROR - stderr -   warnings.warn(
2025-01-26 23:53:16 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-26 23:53:16 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-26 23:53:19 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:53:19 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:53:19 - INFO - transformers.trainer -   Num examples = 6
2025-01-26 23:53:19 - INFO - transformers.trainer -   Num examples = 6
2025-01-26 23:53:19 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:53:19 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:53:19 - INFO - transformers.trainer -   Instantaneous batch size per device = 2
2025-01-26 23:53:19 - INFO - transformers.trainer -   Instantaneous batch size per device = 2
2025-01-26 23:53:19 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 4
2025-01-26 23:53:19 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 4
2025-01-26 23:53:19 - INFO - transformers.trainer -   Gradient Accumulation steps = 1
2025-01-26 23:53:19 - INFO - transformers.trainer -   Gradient Accumulation steps = 1
2025-01-26 23:53:19 - INFO - transformers.trainer -   Total optimization steps = 2
2025-01-26 23:53:19 - INFO - transformers.trainer -   Total optimization steps = 2
2025-01-26 23:53:19 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:53:19 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:53:19 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:53:19 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:53:24 - ERROR - stderr - [34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
2025-01-26 23:53:24 - INFO - wandb - Current SDK version is 0.19.4
2025-01-26 23:53:24 - INFO - wandb - Configure stats pid to 46170
2025-01-26 23:53:24 - INFO - wandb - Loading settings from /root/.config/wandb/settings
2025-01-26 23:53:24 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings
2025-01-26 23:53:24 - INFO - wandb - Loading settings from environment variables
2025-01-26 23:53:24 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_235324-odc8wu1v/logs/debug.log
2025-01-26 23:53:24 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_235324-odc8wu1v/logs/debug-internal.log
2025-01-26 23:53:24 - INFO - wandb - calling init triggers
2025-01-26 23:53:24 - INFO - wandb - wandb.init called with sweep_config: {}
config: {}
2025-01-26 23:53:24 - INFO - wandb - starting backend
2025-01-26 23:53:24 - ERROR - stderr - [34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
2025-01-26 23:53:24 - INFO - wandb - sending inform_init request
2025-01-26 23:53:24 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings:
2025-01-26 23:53:24 - ERROR - stderr -   Expected `list[str]` but got `tuple` - serialized value may not be as expected
2025-01-26 23:53:24 - ERROR - stderr -   return self.__pydantic_serializer__.to_python(
2025-01-26 23:53:24 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2025-01-26 23:53:24 - INFO - wandb - backend started and connected
2025-01-26 23:53:24 - DEBUG - wandb - no default config file found in config-defaults.yaml
2025-01-26 23:53:24 - INFO - wandb - updated telemetry
2025-01-26 23:53:24 - INFO - wandb - communicating run to backend with 90.0 second timeout
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:53:25 - INFO - wandb - starting run threads in backend
2025-01-26 23:53:25 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings:
2025-01-26 23:53:25 - ERROR - stderr -   Expected `list[str]` but got `tuple` - serialized value may not be as expected
2025-01-26 23:53:25 - ERROR - stderr -   return self.__pydantic_serializer__.to_python(
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: Tracking run with wandb version 0.19.4
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: Run data is saved locally in [35m[1m/code/syr/PointLLM/wandb/run-20250126_235324-odc8wu1v[0m
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: Syncing run [33mPointLLM_train_stage2[0m
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface[0m
2025-01-26 23:53:25 - ERROR - stderr - [34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface/runs/odc8wu1v[0m
2025-01-26 23:53:25 - DEBUG - wandb - Saving list of pip packages installed into the current environment
2025-01-26 23:53:26 - INFO - wandb - atexit reg
2025-01-26 23:53:26 - INFO - wandb - redirect: wrap_raw
2025-01-26 23:53:26 - INFO - wandb - Wrapping output streams.
2025-01-26 23:53:26 - INFO - wandb - Redirects installed.
2025-01-26 23:53:26 - INFO - wandb - run started, returning control to user process
2025-01-26 23:53:26 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'chunk_size_feed_forward': 0, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['PointLLMLlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2', 'transformers_version': '4.28.0.dev0', 'DEFAULT_POINT_END_TOKEN': '<point_end>', 'DEFAULT_POINT_PATCH_TOKEN': '<point_patch>', 'DEFAULT_POINT_START_TOKEN': '<point_start>', 'max_position_embeddings': 2048, 'mm_use_point_start_end': True, 'model_type': 'pointllm', 'point_backbone': 'PointBERT', 'point_backbone_ckpt': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_bert_v1.2.pt', 'point_backbone_config_name': 'PointTransformer_8192point_2layer', 'use_color': True, 'output_dir': 'outputs/PointLLM_train_stage2/test_stage2', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'evaluation_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 2, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': 'None', 'per_gpu_eval_batch_size': 'None', 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': 'None', 'eval_delay': 0, 'learning_rate': 3e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 1, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/PointLLM_train_stage2/test_stage2/runs/Jan26_23-52-14_audio-73426-task1-0', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 300, 'save_total_limit': 1, 'save_on_each_node': False, 'no_cuda': False, 'use_mps_device': False, 'seed': 42, 'data_seed': 'None', 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'cuda_amp', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': 'None', 'local_rank': 0, 'xpu_backend': 'None', 'tpu_num_cores': 'None', 'tpu_metrics_debug': False, 'debug': '[]', 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'past_index': -1, 'run_name': 'PointLLM_train_stage2', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': 'None', 'load_best_model_at_end': False, 'metric_for_best_model': 'None', 'greater_is_better': 'None', 'ignore_data_skip': False, 'sharded_ddp': '[]', 'fsdp': "['full_shard', 'auto_wrap']", 'fsdp_min_num_params': 0, 'fsdp_config': "{'fsdp_min_num_params': 0, 'fsdp_transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_grad_ckpt': False}", 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'deepspeed': 'None', 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': 'None', 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': "['wandb']", 'ddp_find_unused_parameters': 'None', 'ddp_bucket_cap_mb': 'None', 'dataloader_pin_memory': True, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': 'None', 'hub_model_id': 'None', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'gradient_checkpointing': True, 'include_inputs_for_metrics': False, 'fp16_backend': 'auto', 'push_to_hub_model_id': 'None', 'push_to_hub_organization': 'None', 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': 'None', 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': 'None', 'torch_compile_mode': 'None', 'cache_dir': '/code/syr/PointLLM/cache_dir', 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'fix_pointnet': True, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_proj.bin', 'detatch_point_token': '<DETATCH_POINT_TOKEN>', 'train_batch_size': 2, 'eval_batch_size': 1}
2025-01-26 23:53:26 - ERROR - stderr -   0%|                                                                                                  | 0/2 [00:00<?, ?it/s]
2025-01-26 23:53:26 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-26 23:53:26 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-26 23:53:27 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
2025-01-26 23:53:27 - ERROR - stderr -   return fn(*args, **kwargs)
2025-01-26 23:53:27 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
2025-01-26 23:53:27 - ERROR - stderr -   return fn(*args, **kwargs)
2025-01-26 23:53:30 - ERROR - stderr -  50%|█████████████████████████████████████████████                                             | 1/2 [00:04<00:04,  4.46s/it]
2025-01-26 23:53:30 - ERROR - stderr - 
2025-01-26 23:53:30 - ERROR - stderr - 
2025-01-26 23:53:30 - INFO - stdout - {'loss': 2.5137, 'learning_rate': 3e-05, 'epoch': 0.5}
2025-01-26 23:53:30 - ERROR - stderr -  50%|█████████████████████████████████████████████                                             | 1/2 [00:04<00:04,  4.46s/it]
2025-01-26 23:53:33 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.37s/it]
2025-01-26 23:53:33 - ERROR - stderr - 
2025-01-26 23:53:33 - ERROR - stderr - 
2025-01-26 23:53:33 - INFO - stdout - {'loss': 2.3709, 'learning_rate': 0.0, 'epoch': 1.0}
2025-01-26 23:53:33 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.37s/it]
2025-01-26 23:53:33 - INFO - transformers.trainer - 

Training completed. Do not forget to share your model on huggingface.co/models =)


2025-01-26 23:53:33 - INFO - transformers.trainer - 

Training completed. Do not forget to share your model on huggingface.co/models =)


2025-01-26 23:53:33 - ERROR - stderr - 
2025-01-26 23:53:33 - ERROR - stderr - 
2025-01-26 23:53:33 - INFO - stdout - {'train_runtime': 13.9536, 'train_samples_per_second': 0.43, 'train_steps_per_second': 0.143, 'train_loss': 2.442263603210449, 'epoch': 1.0}
2025-01-26 23:53:33 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.37s/it]
2025-01-26 23:53:33 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.54s/it]
2025-01-26 23:53:33 - ERROR - stderr - 
2025-01-26 23:53:47 - INFO - transformers.trainer - Saving model checkpoint to outputs/PointLLM_train_stage2/test_stage2
2025-01-26 23:53:47 - INFO - transformers.trainer - Saving model checkpoint to outputs/PointLLM_train_stage2/test_stage2
2025-01-26 23:53:47 - INFO - transformers.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/config.json
2025-01-26 23:53:47 - INFO - transformers.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/config.json
2025-01-26 23:53:47 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/generation_config.json
2025-01-26 23:53:47 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/generation_config.json
2025-01-26 23:54:03 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/PointLLM_train_stage2/test_stage2/pytorch_model.bin.index.json.
2025-01-26 23:54:03 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/PointLLM_train_stage2/test_stage2/pytorch_model.bin.index.json.
2025-01-26 23:54:03 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/PointLLM_train_stage2/test_stage2/tokenizer_config.json
2025-01-26 23:54:03 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/PointLLM_train_stage2/test_stage2/tokenizer_config.json
2025-01-26 23:54:03 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/PointLLM_train_stage2/test_stage2/special_tokens_map.json
2025-01-26 23:54:03 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/PointLLM_train_stage2/test_stage2/special_tokens_map.json
2025-01-26 23:54:03 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/PointLLM_train_stage2/test_stage2/added_tokens.json
2025-01-26 23:54:03 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/PointLLM_train_stage2/test_stage2/added_tokens.json
2025-01-26 23:54:05 - WARNING - wandb - message_loop has been closed
2025-01-26 23:56:20 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:56:20 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:56:20 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:56:20 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:56:20 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:56:20 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:56:20 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:56:20 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:56:21 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                       | 0/3 [00:00<?, ?it/s]
2025-01-26 23:56:21 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:56:21 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:56:21 - INFO - pointllm.model.pointllm - Using PointBERT.
2025-01-26 23:56:21 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml.
2025-01-26 23:56:21 - INFO - pointllm.model.pointllm - Using 6 dim of points.
2025-01-26 23:56:21 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513.
2025-01-26 23:56:21 - INFO - pointllm.model.pointllm - Point backbone output dim: 384.
2025-01-26 23:56:21 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers.
2025-01-26 23:56:21 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units.
2025-01-26 23:56:21 - INFO - pointllm.model.pointllm - Point projector output dim: 4096.
2025-01-26 23:56:22 - ERROR - stderr - Loading checkpoint shards:   0%|                                                                       | 0/3 [00:00<?, ?it/s]
2025-01-26 23:56:22 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/modeling_utils.py:415: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2025-01-26 23:56:22 - ERROR - stderr -   return torch.load(checkpoint_file, map_location="cpu")
2025-01-26 23:56:31 - ERROR - stderr - Loading checkpoint shards:  33%|█████████████████████                                          | 1/3 [00:10<00:20, 10.47s/it]
2025-01-26 23:56:31 - ERROR - stderr - Loading checkpoint shards:  33%|█████████████████████                                          | 1/3 [00:09<00:18,  9.43s/it]
2025-01-26 23:56:41 - ERROR - stderr - Loading checkpoint shards:  67%|██████████████████████████████████████████                     | 2/3 [00:19<00:09,  9.53s/it]
2025-01-26 23:56:41 - ERROR - stderr - Loading checkpoint shards:  67%|██████████████████████████████████████████                     | 2/3 [00:20<00:10, 10.34s/it]
2025-01-26 23:56:48 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:26<00:00,  8.46s/it]
2025-01-26 23:56:48 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:26<00:00,  8.98s/it]
2025-01-26 23:56:48 - ERROR - stderr - 
2025-01-26 23:56:48 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:56:48 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:56:48 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:56:48 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled_orin.json.
2025-01-26 23:56:48 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:56:48 - INFO - stdout - Before filtering, the dataset size is: 60000.
2025-01-26 23:56:48 - INFO - stdout - After filtering, the dataset size is: 60000.
2025-01-26 23:56:48 - INFO - stdout - Number of simple_description: 60000
2025-01-26 23:56:48 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:56:48 - INFO - transformers.trainer - Using cuda_amp half precision backend
2025-01-26 23:56:48 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:56:48 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:56:48 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:56:48 - ERROR - stderr -   warnings.warn(
2025-01-26 23:56:48 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:26<00:00,  8.54s/it]
2025-01-26 23:56:48 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:26<00:00,  8.79s/it]
2025-01-26 23:56:48 - ERROR - stderr - 
2025-01-26 23:56:48 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False
2025-01-26 23:56:48 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded.
2025-01-26 23:56:48 - INFO - pointllm.train.train - Point projection layer is trainable.
2025-01-26 23:56:48 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled_orin.json.
2025-01-26 23:56:49 - INFO - stdout - Using conversation_type: ['simple_description']
2025-01-26 23:56:49 - INFO - stdout - Before filtering, the dataset size is: 60000.
2025-01-26 23:56:49 - INFO - stdout - After filtering, the dataset size is: 60000.
2025-01-26 23:56:49 - INFO - stdout - Number of simple_description: 60000
2025-01-26 23:56:49 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-01-26 23:56:49 - ERROR - stderr - {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-01-26 23:56:49 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-01-26 23:56:49 - ERROR - stderr -   warnings.warn(
2025-01-26 23:56:54 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:56:54 - INFO - transformers.trainer - ***** Running training *****
2025-01-26 23:56:54 - INFO - transformers.trainer -   Num examples = 60000
2025-01-26 23:56:54 - INFO - transformers.trainer -   Num examples = 60000
2025-01-26 23:56:54 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:56:54 - INFO - transformers.trainer -   Num Epochs = 1
2025-01-26 23:56:54 - INFO - transformers.trainer -   Instantaneous batch size per device = 2
2025-01-26 23:56:54 - INFO - transformers.trainer -   Instantaneous batch size per device = 2
2025-01-26 23:56:54 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 128
2025-01-26 23:56:54 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 128
2025-01-26 23:56:54 - INFO - transformers.trainer -   Gradient Accumulation steps = 32
2025-01-26 23:56:54 - INFO - transformers.trainer -   Gradient Accumulation steps = 32
2025-01-26 23:56:54 - INFO - transformers.trainer -   Total optimization steps = 468
2025-01-26 23:56:54 - INFO - transformers.trainer -   Total optimization steps = 468
2025-01-26 23:56:54 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:56:54 - INFO - transformers.trainer -   Number of trainable parameters = 3385592768
2025-01-26 23:56:54 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:56:54 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
2025-01-26 23:56:54 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-26 23:56:54 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-26 23:56:56 - ERROR - stderr - [34m[1mwandb[0m: Currently logged in as: [33m1282467298[0m ([33m1282467298-university-of-nottingham-ningbo-china[0m). Use [1m`wandb login --relogin`[0m to force relogin
2025-01-26 23:56:56 - INFO - wandb - Current SDK version is 0.19.4
2025-01-26 23:56:56 - INFO - wandb - Configure stats pid to 47909
2025-01-26 23:56:56 - INFO - wandb - Loading settings from /root/.config/wandb/settings
2025-01-26 23:56:56 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings
2025-01-26 23:56:56 - INFO - wandb - Loading settings from environment variables
2025-01-26 23:56:56 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_235656-w1cwgs9g/logs/debug.log
2025-01-26 23:56:56 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_235656-w1cwgs9g/logs/debug-internal.log
2025-01-26 23:56:56 - INFO - wandb - calling init triggers
2025-01-26 23:56:56 - INFO - wandb - wandb.init called with sweep_config: {}
config: {}
2025-01-26 23:56:56 - INFO - wandb - starting backend
2025-01-26 23:56:56 - ERROR - stderr - [34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
2025-01-26 23:56:56 - INFO - wandb - sending inform_init request
2025-01-26 23:56:56 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings:
2025-01-26 23:56:56 - ERROR - stderr -   Expected `list[str]` but got `tuple` - serialized value may not be as expected
2025-01-26 23:56:56 - ERROR - stderr -   return self.__pydantic_serializer__.to_python(
2025-01-26 23:56:56 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2025-01-26 23:56:56 - INFO - wandb - backend started and connected
2025-01-26 23:56:56 - DEBUG - wandb - no default config file found in config-defaults.yaml
2025-01-26 23:56:56 - INFO - wandb - updated telemetry
2025-01-26 23:56:56 - INFO - wandb - communicating run to backend with 90.0 second timeout
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: - Waiting for wandb.init()...
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: \ Waiting for wandb.init()...
2025-01-26 23:56:57 - INFO - wandb - starting run threads in backend
2025-01-26 23:56:57 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings:
2025-01-26 23:56:57 - ERROR - stderr -   Expected `list[str]` but got `tuple` - serialized value may not be as expected
2025-01-26 23:56:57 - ERROR - stderr -   return self.__pydantic_serializer__.to_python(
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: Tracking run with wandb version 0.19.4
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: Run data is saved locally in [35m[1m/code/syr/PointLLM/wandb/run-20250126_235656-w1cwgs9g[0m
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: Syncing run [33mPointLLM_train_stage2[0m
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface[0m
2025-01-26 23:56:57 - ERROR - stderr - [34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface/runs/w1cwgs9g[0m
2025-01-26 23:56:57 - DEBUG - wandb - Saving list of pip packages installed into the current environment
2025-01-26 23:56:57 - INFO - wandb - atexit reg
2025-01-26 23:56:57 - INFO - wandb - redirect: wrap_raw
2025-01-26 23:56:57 - INFO - wandb - Wrapping output streams.
2025-01-26 23:56:57 - INFO - wandb - Redirects installed.
2025-01-26 23:56:57 - INFO - wandb - run started, returning control to user process
2025-01-26 23:56:57 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'chunk_size_feed_forward': 0, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['PointLLMLlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2', 'transformers_version': '4.28.0.dev0', 'DEFAULT_POINT_END_TOKEN': '<point_end>', 'DEFAULT_POINT_PATCH_TOKEN': '<point_patch>', 'DEFAULT_POINT_START_TOKEN': '<point_start>', 'max_position_embeddings': 2048, 'mm_use_point_start_end': True, 'model_type': 'pointllm', 'point_backbone': 'PointBERT', 'point_backbone_ckpt': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_bert_v1.2.pt', 'point_backbone_config_name': 'PointTransformer_8192point_2layer', 'use_color': True, 'output_dir': 'outputs/PointLLM_train_stage2/test_stage2', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'evaluation_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 2, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': 'None', 'per_gpu_eval_batch_size': 'None', 'gradient_accumulation_steps': 32, 'eval_accumulation_steps': 'None', 'eval_delay': 0, 'learning_rate': 3e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 1, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/PointLLM_train_stage2/test_stage2/runs/Jan26_23-55-42_audio-73426-task1-0', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 300, 'save_total_limit': 1, 'save_on_each_node': False, 'no_cuda': False, 'use_mps_device': False, 'seed': 42, 'data_seed': 'None', 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'cuda_amp', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': 'None', 'local_rank': 0, 'xpu_backend': 'None', 'tpu_num_cores': 'None', 'tpu_metrics_debug': False, 'debug': '[]', 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'past_index': -1, 'run_name': 'PointLLM_train_stage2', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': 'None', 'load_best_model_at_end': False, 'metric_for_best_model': 'None', 'greater_is_better': 'None', 'ignore_data_skip': False, 'sharded_ddp': '[]', 'fsdp': "['full_shard', 'auto_wrap']", 'fsdp_min_num_params': 0, 'fsdp_config': "{'fsdp_min_num_params': 0, 'fsdp_transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_grad_ckpt': False}", 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'deepspeed': 'None', 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': 'None', 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': "['wandb']", 'ddp_find_unused_parameters': 'None', 'ddp_bucket_cap_mb': 'None', 'dataloader_pin_memory': True, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': 'None', 'hub_model_id': 'None', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'gradient_checkpointing': True, 'include_inputs_for_metrics': False, 'fp16_backend': 'auto', 'push_to_hub_model_id': 'None', 'push_to_hub_organization': 'None', 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': 'None', 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': 'None', 'torch_compile_mode': 'None', 'cache_dir': '/code/syr/PointLLM/cache_dir', 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'fix_pointnet': True, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_proj.bin', 'detatch_point_token': '<DETATCH_POINT_TOKEN>', 'train_batch_size': 2, 'eval_batch_size': 1}
2025-01-26 23:56:57 - ERROR - stderr -   0%|                                                                                                | 0/468 [00:00<?, ?it/s]
2025-01-26 23:56:58 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-26 23:56:58 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-26 23:56:59 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
2025-01-26 23:56:59 - ERROR - stderr -   return fn(*args, **kwargs)
2025-01-26 23:56:59 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
2025-01-26 23:56:59 - ERROR - stderr -   return fn(*args, **kwargs)
2025-01-26 23:58:32 - ERROR - stderr -   0%|▏                                                                                    | 1/468 [01:34<12:12:15, 94.08s/it]
2025-01-26 23:58:32 - ERROR - stderr - 
2025-01-26 23:58:32 - ERROR - stderr - 
2025-01-26 23:58:32 - INFO - stdout - {'loss': 2.4992, 'learning_rate': 2e-06, 'epoch': 0.0}
2025-01-26 23:58:32 - ERROR - stderr -   0%|▏                                                                                    | 1/468 [01:34<12:12:15, 94.08s/it]
2025-01-27 00:00:04 - ERROR - stderr -   0%|▎                                                                                    | 2/468 [03:06<12:01:11, 92.86s/it]
2025-01-27 00:00:04 - ERROR - stderr - 
2025-01-27 00:00:04 - ERROR - stderr - 
2025-01-27 00:00:04 - INFO - stdout - {'loss': 2.5133, 'learning_rate': 4e-06, 'epoch': 0.0}
2025-01-27 00:00:04 - ERROR - stderr -   0%|▎                                                                                    | 2/468 [03:06<12:01:11, 92.86s/it]
2025-01-27 00:01:36 - ERROR - stderr -   1%|▌                                                                                    | 3/468 [04:38<11:56:54, 92.51s/it]
2025-01-27 00:01:36 - ERROR - stderr - 
2025-01-27 00:01:36 - ERROR - stderr - 
2025-01-27 00:01:36 - INFO - stdout - {'loss': 1.9138, 'learning_rate': 6e-06, 'epoch': 0.01}
2025-01-27 00:01:36 - ERROR - stderr -   1%|▌                                                                                    | 3/468 [04:38<11:56:54, 92.51s/it]
2025-01-27 00:03:07 - ERROR - stderr -   1%|▋                                                                                    | 4/468 [06:09<11:50:46, 91.91s/it]
2025-01-27 00:03:07 - ERROR - stderr - 
2025-01-27 00:03:07 - ERROR - stderr - 
2025-01-27 00:03:07 - INFO - stdout - {'loss': 1.3926, 'learning_rate': 8e-06, 'epoch': 0.01}
2025-01-27 00:03:07 - ERROR - stderr -   1%|▋                                                                                    | 4/468 [06:09<11:50:46, 91.91s/it]
2025-01-27 00:04:39 - ERROR - stderr -   1%|▉                                                                                    | 5/468 [07:41<11:51:00, 92.14s/it]
2025-01-27 00:04:39 - ERROR - stderr - 
2025-01-27 00:04:39 - ERROR - stderr - 
2025-01-27 00:04:39 - INFO - stdout - {'loss': 1.04, 'learning_rate': 9.999999999999999e-06, 'epoch': 0.01}
2025-01-27 00:04:39 - ERROR - stderr -   1%|▉                                                                                    | 5/468 [07:41<11:51:00, 92.14s/it]
2025-01-27 00:06:11 - ERROR - stderr -   1%|█                                                                                    | 6/468 [09:13<11:47:22, 91.87s/it]
2025-01-27 00:06:11 - ERROR - stderr - 
2025-01-27 00:06:11 - ERROR - stderr - 
2025-01-27 00:06:11 - INFO - stdout - {'loss': 0.839, 'learning_rate': 1.2e-05, 'epoch': 0.01}
2025-01-27 00:06:11 - ERROR - stderr -   1%|█                                                                                    | 6/468 [09:13<11:47:22, 91.87s/it]
2025-01-27 00:07:42 - ERROR - stderr -   1%|█▎                                                                                   | 7/468 [10:44<11:45:52, 91.87s/it]
2025-01-27 00:07:42 - ERROR - stderr - 
2025-01-27 00:07:42 - ERROR - stderr - 
2025-01-27 00:07:42 - INFO - stdout - {'loss': 0.7951, 'learning_rate': 1.4e-05, 'epoch': 0.01}
2025-01-27 00:07:42 - ERROR - stderr -   1%|█▎                                                                                   | 7/468 [10:44<11:45:52, 91.87s/it]
2025-01-27 00:09:14 - ERROR - stderr -   2%|█▍                                                                                   | 8/468 [12:16<11:44:14, 91.86s/it]
2025-01-27 00:09:14 - ERROR - stderr - 
2025-01-27 00:09:14 - ERROR - stderr - 
2025-01-27 00:09:14 - INFO - stdout - {'loss': 0.776, 'learning_rate': 1.6e-05, 'epoch': 0.02}
2025-01-27 00:09:14 - ERROR - stderr -   2%|█▍                                                                                   | 8/468 [12:16<11:44:14, 91.86s/it]
2025-01-27 00:10:45 - ERROR - stderr -   2%|█▋                                                                                   | 9/468 [13:47<11:39:42, 91.47s/it]
2025-01-27 00:10:45 - ERROR - stderr - 
2025-01-27 00:10:45 - ERROR - stderr - 
2025-01-27 00:10:45 - INFO - stdout - {'loss': 0.7137, 'learning_rate': 1.8e-05, 'epoch': 0.02}
2025-01-27 00:10:45 - ERROR - stderr -   2%|█▋                                                                                   | 9/468 [13:47<11:39:42, 91.47s/it]
2025-01-27 00:12:16 - ERROR - stderr -   2%|█▊                                                                                  | 10/468 [15:18<11:38:29, 91.50s/it]
2025-01-27 00:12:17 - ERROR - stderr - 
2025-01-27 00:12:17 - ERROR - stderr - 
2025-01-27 00:12:17 - INFO - stdout - {'loss': 0.6901, 'learning_rate': 1.9999999999999998e-05, 'epoch': 0.02}
2025-01-27 00:12:17 - ERROR - stderr -   2%|█▊                                                                                  | 10/468 [15:19<11:38:29, 91.50s/it]
2025-01-27 00:13:47 - ERROR - stderr -   2%|█▉                                                                                  | 11/468 [16:49<11:34:56, 91.24s/it]
2025-01-27 00:13:47 - ERROR - stderr - 
2025-01-27 00:13:47 - ERROR - stderr - 
2025-01-27 00:13:47 - INFO - stdout - {'loss': 0.6603, 'learning_rate': 2.2e-05, 'epoch': 0.02}
2025-01-27 00:13:47 - ERROR - stderr -   2%|█▉                                                                                  | 11/468 [16:49<11:34:56, 91.24s/it]
2025-01-27 00:15:18 - ERROR - stderr -   3%|██▏                                                                                 | 12/468 [18:20<11:32:40, 91.14s/it]
2025-01-27 00:15:18 - ERROR - stderr - 
2025-01-27 00:15:18 - ERROR - stderr - 
2025-01-27 00:15:18 - INFO - stdout - {'loss': 0.6495, 'learning_rate': 2.4e-05, 'epoch': 0.03}
2025-01-27 00:15:18 - ERROR - stderr -   3%|██▏                                                                                 | 12/468 [18:20<11:32:40, 91.14s/it]
2025-01-27 00:16:48 - ERROR - stderr -   3%|██▎                                                                                 | 13/468 [19:50<11:28:48, 90.83s/it]
2025-01-27 00:16:48 - ERROR - stderr - 
2025-01-27 00:16:48 - ERROR - stderr - 
2025-01-27 00:16:48 - INFO - stdout - {'loss': 0.6096, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.03}
2025-01-27 00:16:48 - ERROR - stderr -   3%|██▎                                                                                 | 13/468 [19:50<11:28:48, 90.83s/it]
2025-01-27 00:18:18 - ERROR - stderr -   3%|██▌                                                                                 | 14/468 [21:20<11:25:23, 90.58s/it]
2025-01-27 00:18:18 - ERROR - stderr - 
2025-01-27 00:18:18 - ERROR - stderr - 
2025-01-27 00:18:18 - INFO - stdout - {'loss': 0.6309, 'learning_rate': 2.8e-05, 'epoch': 0.03}
2025-01-27 00:18:18 - ERROR - stderr -   3%|██▌                                                                                 | 14/468 [21:20<11:25:23, 90.58s/it]
2025-01-27 00:19:48 - ERROR - stderr -   3%|██▋                                                                                 | 15/468 [22:50<11:22:22, 90.38s/it]
2025-01-27 00:19:48 - ERROR - stderr - 
2025-01-27 00:19:48 - ERROR - stderr - 
2025-01-27 00:19:48 - INFO - stdout - {'loss': 0.5983, 'learning_rate': 3e-05, 'epoch': 0.03}
2025-01-27 00:19:48 - ERROR - stderr -   3%|██▋                                                                                 | 15/468 [22:50<11:22:22, 90.38s/it]
2025-01-27 00:21:19 - ERROR - stderr -   3%|██▊                                                                                 | 16/468 [24:21<11:22:58, 90.66s/it]
2025-01-27 00:21:19 - ERROR - stderr - 
2025-01-27 00:21:19 - ERROR - stderr - 
2025-01-27 00:21:19 - INFO - stdout - {'loss': 0.6011, 'learning_rate': 2.999963928611156e-05, 'epoch': 0.03}
2025-01-27 00:21:19 - ERROR - stderr -   3%|██▊                                                                                 | 16/468 [24:21<11:22:58, 90.66s/it]
2025-01-27 00:22:50 - ERROR - stderr -   4%|███                                                                                 | 17/468 [25:52<11:20:40, 90.56s/it]
2025-01-27 00:22:50 - ERROR - stderr - 
2025-01-27 00:22:50 - ERROR - stderr - 
2025-01-27 00:22:50 - INFO - stdout - {'loss': 0.5982, 'learning_rate': 2.9998557161794857e-05, 'epoch': 0.04}
2025-01-27 00:22:50 - ERROR - stderr -   4%|███                                                                                 | 17/468 [25:52<11:20:40, 90.56s/it]
2025-01-27 00:24:20 - ERROR - stderr -   4%|███▏                                                                                | 18/468 [27:22<11:18:32, 90.47s/it]
2025-01-27 00:24:20 - ERROR - stderr - 
2025-01-27 00:24:20 - ERROR - stderr - 
2025-01-27 00:24:20 - INFO - stdout - {'loss': 0.5863, 'learning_rate': 2.9996753679094844e-05, 'epoch': 0.04}
2025-01-27 00:24:20 - ERROR - stderr -   4%|███▏                                                                                | 18/468 [27:22<11:18:32, 90.47s/it]
2025-01-27 00:25:50 - ERROR - stderr -   4%|███▍                                                                                | 19/468 [28:52<11:16:09, 90.35s/it]
2025-01-27 00:25:50 - ERROR - stderr - 
2025-01-27 00:25:50 - ERROR - stderr - 
2025-01-27 00:25:50 - INFO - stdout - {'loss': 0.5939, 'learning_rate': 2.9994228924750363e-05, 'epoch': 0.04}
2025-01-27 00:25:50 - ERROR - stderr -   4%|███▍                                                                                | 19/468 [28:52<11:16:09, 90.35s/it]
2025-01-27 00:27:21 - ERROR - stderr -   4%|███▌                                                                                | 20/468 [30:23<11:14:55, 90.39s/it]
2025-01-27 00:27:21 - ERROR - stderr - 
2025-01-27 00:27:21 - ERROR - stderr - 
2025-01-27 00:27:21 - INFO - stdout - {'loss': 0.5674, 'learning_rate': 2.9990983020189943e-05, 'epoch': 0.04}
2025-01-27 00:27:21 - ERROR - stderr -   4%|███▌                                                                                | 20/468 [30:23<11:14:55, 90.39s/it]
2025-01-27 00:28:51 - ERROR - stderr -   4%|███▊                                                                                | 21/468 [31:53<11:13:23, 90.39s/it]
2025-01-27 00:28:51 - ERROR - stderr - 
2025-01-27 00:28:51 - ERROR - stderr - 
2025-01-27 00:28:51 - INFO - stdout - {'loss': 0.574, 'learning_rate': 2.9987016121525965e-05, 'epoch': 0.04}
2025-01-27 00:28:51 - ERROR - stderr -   4%|███▊                                                                                | 21/468 [31:53<11:13:23, 90.39s/it]
2025-01-27 00:30:21 - ERROR - stderr -   5%|███▉                                                                                | 22/468 [33:23<11:10:56, 90.26s/it]
2025-01-27 00:30:21 - ERROR - stderr - 
2025-01-27 00:30:21 - ERROR - stderr - 
2025-01-27 00:30:21 - INFO - stdout - {'loss': 0.5602, 'learning_rate': 2.9982328419547154e-05, 'epoch': 0.05}
2025-01-27 00:30:21 - ERROR - stderr -   5%|███▉                                                                                | 22/468 [33:23<11:10:56, 90.26s/it]
2025-01-27 00:31:51 - ERROR - stderr -   5%|████▏                                                                               | 23/468 [34:53<11:08:59, 90.20s/it]
2025-01-27 00:31:51 - ERROR - stderr - 
2025-01-27 00:31:51 - ERROR - stderr - 
2025-01-27 00:31:51 - INFO - stdout - {'loss': 0.5723, 'learning_rate': 2.99769201397094e-05, 'epoch': 0.05}
2025-01-27 00:31:51 - ERROR - stderr -   5%|████▏                                                                               | 23/468 [34:53<11:08:59, 90.20s/it]
2025-01-27 00:33:20 - ERROR - stderr -   5%|████▎                                                                               | 24/468 [36:22<11:05:41, 89.96s/it]
2025-01-27 00:33:20 - ERROR - stderr - 
2025-01-27 00:33:20 - ERROR - stderr - 
2025-01-27 00:33:20 - INFO - stdout - {'loss': 0.5662, 'learning_rate': 2.9970791542124925e-05, 'epoch': 0.05}
2025-01-27 00:33:20 - ERROR - stderr -   5%|████▎                                                                               | 24/468 [36:22<11:05:41, 89.96s/it]
2025-01-27 00:34:50 - ERROR - stderr -   5%|████▍                                                                               | 25/468 [37:52<11:03:14, 89.83s/it]
2025-01-27 00:34:50 - ERROR - stderr - 
2025-01-27 00:34:50 - ERROR - stderr - 
2025-01-27 00:34:50 - INFO - stdout - {'loss': 0.5683, 'learning_rate': 2.9963942921549767e-05, 'epoch': 0.05}
2025-01-27 00:34:50 - ERROR - stderr -   5%|████▍                                                                               | 25/468 [37:52<11:03:14, 89.83s/it]
2025-01-27 00:36:19 - ERROR - stderr -   6%|████▋                                                                               | 26/468 [39:21<11:00:06, 89.61s/it]
2025-01-27 00:36:19 - ERROR - stderr - 
2025-01-27 00:36:19 - ERROR - stderr - 
2025-01-27 00:36:19 - INFO - stdout - {'loss': 0.5539, 'learning_rate': 2.99563746073696e-05, 'epoch': 0.06}
2025-01-27 00:36:19 - ERROR - stderr -   6%|████▋                                                                               | 26/468 [39:21<11:00:06, 89.61s/it]
2025-01-27 00:37:49 - ERROR - stderr -   6%|████▊                                                                               | 27/468 [40:51<10:59:37, 89.74s/it]
2025-01-27 00:37:49 - ERROR - stderr - 
2025-01-27 00:37:49 - ERROR - stderr - 
2025-01-27 00:37:49 - INFO - stdout - {'loss': 0.5468, 'learning_rate': 2.9948086963583895e-05, 'epoch': 0.06}
2025-01-27 00:37:49 - ERROR - stderr -   6%|████▊                                                                               | 27/468 [40:51<10:59:37, 89.74s/it]
2025-01-27 00:39:18 - ERROR - stderr -   6%|█████                                                                               | 28/468 [42:20<10:56:56, 89.58s/it]
2025-01-27 00:39:18 - ERROR - stderr - 
2025-01-27 00:39:18 - ERROR - stderr - 
2025-01-27 00:39:18 - INFO - stdout - {'loss': 0.555, 'learning_rate': 2.9939080388788412e-05, 'epoch': 0.06}
2025-01-27 00:39:18 - ERROR - stderr -   6%|█████                                                                               | 28/468 [42:20<10:56:56, 89.58s/it]
2025-01-27 00:40:48 - ERROR - stderr -   6%|█████▏                                                                              | 29/468 [43:50<10:55:17, 89.56s/it]
2025-01-27 00:40:48 - ERROR - stderr - 
2025-01-27 00:40:48 - ERROR - stderr - 
2025-01-27 00:40:48 - INFO - stdout - {'loss': 0.5614, 'learning_rate': 2.9929355316156036e-05, 'epoch': 0.06}
2025-01-27 00:40:48 - ERROR - stderr -   6%|█████▏                                                                              | 29/468 [43:50<10:55:17, 89.56s/it]
2025-01-27 00:42:17 - ERROR - stderr -   6%|█████▍                                                                              | 30/468 [45:19<10:53:27, 89.52s/it]
2025-01-27 00:42:17 - ERROR - stderr - 
2025-01-27 00:42:17 - ERROR - stderr - 
2025-01-27 00:42:17 - INFO - stdout - {'loss': 0.5401, 'learning_rate': 2.9918912213415933e-05, 'epoch': 0.06}
2025-01-27 00:42:17 - ERROR - stderr -   6%|█████▍                                                                              | 30/468 [45:19<10:53:27, 89.52s/it]
2025-01-27 00:43:47 - ERROR - stderr -   7%|█████▌                                                                              | 31/468 [46:49<10:52:06, 89.53s/it]
2025-01-27 00:43:47 - ERROR - stderr - 
2025-01-27 00:43:47 - ERROR - stderr - 
2025-01-27 00:43:47 - INFO - stdout - {'loss': 0.5571, 'learning_rate': 2.9907751582831066e-05, 'epoch': 0.07}
2025-01-27 00:43:47 - ERROR - stderr -   7%|█████▌                                                                              | 31/468 [46:49<10:52:06, 89.53s/it]
2025-01-27 00:45:17 - ERROR - stderr -   7%|█████▋                                                                              | 32/468 [48:19<10:53:03, 89.87s/it]
2025-01-27 00:45:17 - ERROR - stderr - 
2025-01-27 00:45:17 - ERROR - stderr - 
2025-01-27 00:45:17 - INFO - stdout - {'loss': 0.5562, 'learning_rate': 2.9895873961174027e-05, 'epoch': 0.07}
2025-01-27 00:45:17 - ERROR - stderr -   7%|█████▋                                                                              | 32/468 [48:19<10:53:03, 89.87s/it]
2025-01-27 00:46:48 - ERROR - stderr -   7%|█████▉                                                                              | 33/468 [49:50<10:52:45, 90.04s/it]
2025-01-27 00:46:48 - ERROR - stderr - 
2025-01-27 00:46:48 - ERROR - stderr - 
2025-01-27 00:46:48 - INFO - stdout - {'loss': 0.5495, 'learning_rate': 2.9883279919701227e-05, 'epoch': 0.07}
2025-01-27 00:46:48 - ERROR - stderr -   7%|█████▉                                                                              | 33/468 [49:50<10:52:45, 90.04s/it]
2025-01-27 00:48:17 - ERROR - stderr -   7%|██████                                                                              | 34/468 [51:19<10:50:20, 89.91s/it]
2025-01-27 00:48:17 - ERROR - stderr - 
2025-01-27 00:48:17 - ERROR - stderr - 
2025-01-27 00:48:17 - INFO - stdout - {'loss': 0.5486, 'learning_rate': 2.9869970064125424e-05, 'epoch': 0.07}
2025-01-27 00:48:17 - ERROR - stderr -   7%|██████                                                                              | 34/468 [51:19<10:50:20, 89.91s/it]
2025-01-27 00:49:47 - ERROR - stderr -   7%|██████▎                                                                             | 35/468 [52:49<10:47:45, 89.76s/it]
2025-01-27 00:49:47 - ERROR - stderr - 
2025-01-27 00:49:47 - ERROR - stderr - 
2025-01-27 00:49:47 - INFO - stdout - {'loss': 0.5411, 'learning_rate': 2.9855945034586584e-05, 'epoch': 0.07}
2025-01-27 00:49:47 - ERROR - stderr -   7%|██████▎                                                                             | 35/468 [52:49<10:47:45, 89.76s/it]
2025-01-27 00:51:17 - ERROR - stderr -   8%|██████▍                                                                             | 36/468 [54:19<10:47:19, 89.91s/it]
2025-01-27 00:51:17 - ERROR - stderr - 
2025-01-27 00:51:17 - ERROR - stderr - 
2025-01-27 00:51:17 - INFO - stdout - {'loss': 0.5376, 'learning_rate': 2.9841205505621104e-05, 'epoch': 0.08}
2025-01-27 00:51:17 - ERROR - stderr -   8%|██████▍                                                                             | 36/468 [54:19<10:47:19, 89.91s/it]
2025-01-27 00:52:46 - ERROR - stderr -   8%|██████▋                                                                             | 37/468 [55:48<10:43:15, 89.55s/it]
2025-01-27 00:52:46 - ERROR - stderr - 
2025-01-27 00:52:46 - ERROR - stderr - 
2025-01-27 00:52:46 - INFO - stdout - {'loss': 0.5393, 'learning_rate': 2.9825752186129355e-05, 'epoch': 0.08}
2025-01-27 00:52:46 - ERROR - stderr -   8%|██████▋                                                                             | 37/468 [55:48<10:43:15, 89.55s/it]
2025-01-27 00:54:14 - ERROR - stderr -   8%|██████▊                                                                             | 38/468 [57:16<10:39:56, 89.29s/it]
2025-01-27 00:54:14 - ERROR - stderr - 
2025-01-27 00:54:14 - ERROR - stderr - 
2025-01-27 00:54:14 - INFO - stdout - {'loss': 0.532, 'learning_rate': 2.98095858193416e-05, 'epoch': 0.08}
2025-01-27 00:54:14 - ERROR - stderr -   8%|██████▊                                                                             | 38/468 [57:16<10:39:56, 89.29s/it]
2025-01-27 00:55:44 - ERROR - stderr -   8%|███████                                                                             | 39/468 [58:46<10:38:38, 89.32s/it]
2025-01-27 00:55:44 - ERROR - stderr - 
2025-01-27 00:55:44 - ERROR - stderr - 
2025-01-27 00:55:44 - INFO - stdout - {'loss': 0.5242, 'learning_rate': 2.979270718278224e-05, 'epoch': 0.08}
2025-01-27 00:55:44 - ERROR - stderr -   8%|███████                                                                             | 39/468 [58:46<10:38:38, 89.32s/it]
2025-01-27 00:57:14 - ERROR - stderr -   9%|███████                                                                           | 40/468 [1:00:16<10:38:13, 89.47s/it]
2025-01-27 00:57:14 - ERROR - stderr - 
2025-01-27 00:57:14 - ERROR - stderr - 
2025-01-27 00:57:14 - INFO - stdout - {'loss': 0.5452, 'learning_rate': 2.977511708823242e-05, 'epoch': 0.09}
2025-01-27 00:57:14 - ERROR - stderr -   9%|███████                                                                           | 40/468 [1:00:16<10:38:13, 89.47s/it]
2025-01-27 00:58:44 - ERROR - stderr -   9%|███████▏                                                                          | 41/468 [1:01:46<10:37:34, 89.59s/it]
2025-01-27 00:58:44 - ERROR - stderr - 
2025-01-27 00:58:44 - ERROR - stderr - 
2025-01-27 00:58:44 - INFO - stdout - {'loss': 0.5531, 'learning_rate': 2.9756816381691003e-05, 'epoch': 0.09}
2025-01-27 00:58:44 - ERROR - stderr -   9%|███████▏                                                                          | 41/468 [1:01:46<10:37:34, 89.59s/it]
2025-01-27 01:00:13 - ERROR - stderr -   9%|███████▎                                                                          | 42/468 [1:03:15<10:36:33, 89.66s/it]
2025-01-27 01:00:13 - ERROR - stderr - 
2025-01-27 01:00:13 - ERROR - stderr - 
2025-01-27 01:00:13 - INFO - stdout - {'loss': 0.537, 'learning_rate': 2.9737805943333857e-05, 'epoch': 0.09}
2025-01-27 01:00:13 - ERROR - stderr -   9%|███████▎                                                                          | 42/468 [1:03:15<10:36:33, 89.66s/it]
2025-01-27 01:01:43 - ERROR - stderr -   9%|███████▌                                                                          | 43/468 [1:04:45<10:34:54, 89.63s/it]
2025-01-27 01:01:43 - ERROR - stderr - 
2025-01-27 01:01:43 - ERROR - stderr - 
2025-01-27 01:01:43 - INFO - stdout - {'loss': 0.5144, 'learning_rate': 2.971808668747153e-05, 'epoch': 0.09}
2025-01-27 01:01:43 - ERROR - stderr -   9%|███████▌                                                                          | 43/468 [1:04:45<10:34:54, 89.63s/it]
2025-01-27 01:03:13 - ERROR - stderr -   9%|███████▋                                                                          | 44/468 [1:06:15<10:33:57, 89.71s/it]
2025-01-27 01:03:13 - ERROR - stderr - 
2025-01-27 01:03:13 - ERROR - stderr - 
2025-01-27 01:03:13 - INFO - stdout - {'loss': 0.5443, 'learning_rate': 2.9697659562505286e-05, 'epoch': 0.09}
2025-01-27 01:03:13 - ERROR - stderr -   9%|███████▋                                                                          | 44/468 [1:06:15<10:33:57, 89.71s/it]
2025-01-27 01:04:42 - ERROR - stderr -  10%|███████▉                                                                          | 45/468 [1:07:44<10:32:15, 89.68s/it]
2025-01-27 01:04:42 - ERROR - stderr - 
2025-01-27 01:04:42 - ERROR - stderr - 
2025-01-27 01:04:42 - INFO - stdout - {'loss': 0.5299, 'learning_rate': 2.9676525550881482e-05, 'epoch': 0.1}
2025-01-27 01:04:42 - ERROR - stderr -  10%|███████▉                                                                          | 45/468 [1:07:44<10:32:15, 89.68s/it]
2025-01-27 01:06:11 - ERROR - stderr -  10%|████████                                                                          | 46/468 [1:09:13<10:29:22, 89.49s/it]
2025-01-27 01:06:12 - ERROR - stderr - 
2025-01-27 01:06:12 - ERROR - stderr - 
2025-01-27 01:06:12 - INFO - stdout - {'loss': 0.5374, 'learning_rate': 2.9654685669044313e-05, 'epoch': 0.1}
2025-01-27 01:06:12 - ERROR - stderr -  10%|████████                                                                          | 46/468 [1:09:14<10:29:22, 89.49s/it]
2025-01-27 01:07:41 - ERROR - stderr -  10%|████████▏                                                                         | 47/468 [1:10:43<10:27:10, 89.38s/it]
2025-01-27 01:07:41 - ERROR - stderr - 
2025-01-27 01:07:41 - ERROR - stderr - 
2025-01-27 01:07:41 - INFO - stdout - {'loss': 0.5354, 'learning_rate': 2.963214096738695e-05, 'epoch': 0.1}
2025-01-27 01:07:41 - ERROR - stderr -  10%|████████▏                                                                         | 47/468 [1:10:43<10:27:10, 89.38s/it]
2025-01-27 01:09:11 - ERROR - stderr -  10%|████████▍                                                                         | 48/468 [1:12:13<10:26:48, 89.54s/it]
2025-01-27 01:09:11 - ERROR - stderr - 
2025-01-27 01:09:11 - ERROR - stderr - 
2025-01-27 01:09:11 - INFO - stdout - {'loss': 0.5374, 'learning_rate': 2.960889253020099e-05, 'epoch': 0.1}
2025-01-27 01:09:11 - ERROR - stderr -  10%|████████▍                                                                         | 48/468 [1:12:13<10:26:48, 89.54s/it]
2025-01-27 01:10:40 - ERROR - stderr -  10%|████████▌                                                                         | 49/468 [1:13:42<10:25:44, 89.60s/it]
2025-01-27 01:10:40 - ERROR - stderr - 
2025-01-27 01:10:40 - ERROR - stderr - 
2025-01-27 01:10:40 - INFO - stdout - {'loss': 0.5297, 'learning_rate': 2.9584941475624315e-05, 'epoch': 0.1}
2025-01-27 01:10:40 - ERROR - stderr -  10%|████████▌                                                                         | 49/468 [1:13:42<10:25:44, 89.60s/it]
2025-01-27 01:12:09 - ERROR - stderr -  11%|████████▊                                                                         | 50/468 [1:15:11<10:23:15, 89.46s/it]
2025-01-27 01:12:09 - ERROR - stderr - 
2025-01-27 01:12:09 - ERROR - stderr - 
2025-01-27 01:12:09 - INFO - stdout - {'loss': 0.5401, 'learning_rate': 2.9560288955587334e-05, 'epoch': 0.11}
2025-01-27 01:12:09 - ERROR - stderr -  11%|████████▊                                                                         | 50/468 [1:15:11<10:23:15, 89.46s/it]
2025-01-27 01:13:38 - ERROR - stderr -  11%|████████▉                                                                         | 51/468 [1:16:40<10:20:51, 89.33s/it]
2025-01-27 01:13:38 - ERROR - stderr - 
2025-01-27 01:13:38 - ERROR - stderr - 
2025-01-27 01:13:38 - INFO - stdout - {'loss': 0.5427, 'learning_rate': 2.9534936155757568e-05, 'epoch': 0.11}
2025-01-27 01:13:38 - ERROR - stderr -  11%|████████▉                                                                         | 51/468 [1:16:40<10:20:51, 89.33s/it]
2025-01-27 01:15:07 - ERROR - stderr -  11%|█████████                                                                         | 52/468 [1:18:09<10:18:05, 89.15s/it]
2025-01-27 01:15:07 - ERROR - stderr - 
2025-01-27 01:15:07 - ERROR - stderr - 
2025-01-27 01:15:07 - INFO - stdout - {'loss': 0.5443, 'learning_rate': 2.9508884295482606e-05, 'epoch': 0.11}
2025-01-27 01:15:07 - ERROR - stderr -  11%|█████████                                                                         | 52/468 [1:18:09<10:18:05, 89.15s/it]
2025-01-27 01:16:36 - ERROR - stderr -  11%|█████████▎                                                                        | 53/468 [1:19:38<10:15:04, 88.93s/it]
2025-01-27 01:16:36 - ERROR - stderr - 
2025-01-27 01:16:36 - ERROR - stderr - 
2025-01-27 01:16:36 - INFO - stdout - {'loss': 0.5414, 'learning_rate': 2.94821346277315e-05, 'epoch': 0.11}
2025-01-27 01:16:36 - ERROR - stderr -  11%|█████████▎                                                                        | 53/468 [1:19:38<10:15:04, 88.93s/it]
2025-01-27 01:18:05 - ERROR - stderr -  12%|█████████▍                                                                        | 54/468 [1:21:07<10:14:11, 89.01s/it]
2025-01-27 01:18:05 - ERROR - stderr - 
2025-01-27 01:18:05 - ERROR - stderr - 
2025-01-27 01:18:05 - INFO - stdout - {'loss': 0.5168, 'learning_rate': 2.9454688439034475e-05, 'epoch': 0.12}
2025-01-27 01:18:05 - ERROR - stderr -  12%|█████████▍                                                                        | 54/468 [1:21:07<10:14:11, 89.01s/it]
2025-01-27 01:19:33 - ERROR - stderr -  12%|█████████▋                                                                        | 55/468 [1:22:35<10:11:11, 88.79s/it]
2025-01-27 01:19:33 - ERROR - stderr - 
2025-01-27 01:19:33 - ERROR - stderr - 
2025-01-27 01:19:33 - INFO - stdout - {'loss': 0.5337, 'learning_rate': 2.9426547049421047e-05, 'epoch': 0.12}
2025-01-27 01:19:33 - ERROR - stderr -  12%|█████████▋                                                                        | 55/468 [1:22:35<10:11:11, 88.79s/it]
2025-01-27 01:21:01 - ERROR - stderr -  12%|█████████▊                                                                        | 56/468 [1:24:03<10:08:03, 88.55s/it]
2025-01-27 01:21:01 - ERROR - stderr - 
2025-01-27 01:21:01 - ERROR - stderr - 
2025-01-27 01:21:01 - INFO - stdout - {'loss': 0.5168, 'learning_rate': 2.9397711812356565e-05, 'epoch': 0.12}
2025-01-27 01:21:01 - ERROR - stderr -  12%|█████████▊                                                                        | 56/468 [1:24:03<10:08:03, 88.55s/it]
2025-01-27 01:22:30 - ERROR - stderr -  12%|█████████▉                                                                        | 57/468 [1:25:32<10:06:33, 88.55s/it]
2025-01-27 01:22:30 - ERROR - stderr - 
2025-01-27 01:22:30 - ERROR - stderr - 
2025-01-27 01:22:30 - INFO - stdout - {'loss': 0.5147, 'learning_rate': 2.936818411467709e-05, 'epoch': 0.12}
2025-01-27 01:22:30 - ERROR - stderr -  12%|█████████▉                                                                        | 57/468 [1:25:32<10:06:33, 88.55s/it]
2025-01-27 01:23:59 - ERROR - stderr -  12%|██████████▏                                                                       | 58/468 [1:27:01<10:06:07, 88.70s/it]
2025-01-27 01:23:59 - ERROR - stderr - 
2025-01-27 01:23:59 - ERROR - stderr - 
2025-01-27 01:23:59 - INFO - stdout - {'loss': 0.5343, 'learning_rate': 2.9337965376522717e-05, 'epoch': 0.12}
2025-01-27 01:23:59 - ERROR - stderr -  12%|██████████▏                                                                       | 58/468 [1:27:01<10:06:07, 88.70s/it]
2025-01-27 01:25:27 - ERROR - stderr -  13%|██████████▎                                                                       | 59/468 [1:28:29<10:04:40, 88.71s/it]
2025-01-27 01:25:27 - ERROR - stderr - 
2025-01-27 01:25:27 - ERROR - stderr - 
2025-01-27 01:25:27 - INFO - stdout - {'loss': 0.5167, 'learning_rate': 2.9307057051269242e-05, 'epoch': 0.13}
2025-01-27 01:25:27 - ERROR - stderr -  13%|██████████▎                                                                       | 59/468 [1:28:29<10:04:40, 88.71s/it]
2025-01-27 01:26:55 - ERROR - stderr -  13%|██████████▌                                                                       | 60/468 [1:29:57<10:01:55, 88.52s/it]
2025-01-27 01:26:55 - ERROR - stderr - 
2025-01-27 01:26:55 - ERROR - stderr - 
2025-01-27 01:26:55 - INFO - stdout - {'loss': 0.521, 'learning_rate': 2.9275460625458298e-05, 'epoch': 0.13}
2025-01-27 01:26:55 - ERROR - stderr -  13%|██████████▌                                                                       | 60/468 [1:29:57<10:01:55, 88.52s/it]
2025-01-27 01:28:24 - ERROR - stderr -  13%|██████████▋                                                                       | 61/468 [1:31:26<10:00:58, 88.60s/it]
2025-01-27 01:28:24 - ERROR - stderr - 
2025-01-27 01:28:24 - ERROR - stderr - 
2025-01-27 01:28:24 - INFO - stdout - {'loss': 0.5258, 'learning_rate': 2.9243177618725826e-05, 'epoch': 0.13}
2025-01-27 01:28:24 - ERROR - stderr -  13%|██████████▋                                                                       | 61/468 [1:31:26<10:00:58, 88.60s/it]
2025-01-27 01:29:52 - ERROR - stderr -  13%|██████████▉                                                                        | 62/468 [1:32:54<9:58:20, 88.42s/it]
2025-01-27 01:29:52 - ERROR - stderr - 
2025-01-27 01:29:52 - ERROR - stderr - 
2025-01-27 01:29:52 - INFO - stdout - {'loss': 0.5284, 'learning_rate': 2.921020958372902e-05, 'epoch': 0.13}
2025-01-27 01:29:52 - ERROR - stderr -  13%|██████████▉                                                                        | 62/468 [1:32:54<9:58:20, 88.42s/it]
2025-01-27 01:31:20 - ERROR - stderr -  13%|███████████▏                                                                       | 63/468 [1:34:22<9:55:56, 88.29s/it]
2025-01-27 01:31:20 - ERROR - stderr - 
2025-01-27 01:31:20 - ERROR - stderr - 
2025-01-27 01:31:20 - INFO - stdout - {'loss': 0.5291, 'learning_rate': 2.9176558106071615e-05, 'epoch': 0.13}
2025-01-27 01:31:20 - ERROR - stderr -  13%|███████████▏                                                                       | 63/468 [1:34:22<9:55:56, 88.29s/it]
2025-01-27 01:32:48 - ERROR - stderr -  14%|███████████▎                                                                       | 64/468 [1:35:50<9:53:27, 88.14s/it]
2025-01-27 01:32:48 - ERROR - stderr - 
2025-01-27 01:32:48 - ERROR - stderr - 
2025-01-27 01:32:48 - INFO - stdout - {'loss': 0.5145, 'learning_rate': 2.914222480422767e-05, 'epoch': 0.14}
2025-01-27 01:32:48 - ERROR - stderr -  14%|███████████▎                                                                       | 64/468 [1:35:50<9:53:27, 88.14s/it]
2025-01-27 01:34:16 - ERROR - stderr -  14%|███████████▌                                                                       | 65/468 [1:37:18<9:51:51, 88.12s/it]
2025-01-27 01:34:16 - ERROR - stderr - 
2025-01-27 01:34:16 - ERROR - stderr - 
2025-01-27 01:34:16 - INFO - stdout - {'loss': 0.5349, 'learning_rate': 2.9107211329463685e-05, 'epoch': 0.14}
2025-01-27 01:34:16 - ERROR - stderr -  14%|███████████▌                                                                       | 65/468 [1:37:18<9:51:51, 88.12s/it]
2025-01-27 01:35:45 - ERROR - stderr -  14%|███████████▋                                                                       | 66/468 [1:38:47<9:51:15, 88.25s/it]
2025-01-27 01:35:45 - ERROR - stderr - 
2025-01-27 01:35:45 - ERROR - stderr - 
2025-01-27 01:35:45 - INFO - stdout - {'loss': 0.5243, 'learning_rate': 2.9071519365759214e-05, 'epoch': 0.14}
2025-01-27 01:35:45 - ERROR - stderr -  14%|███████████▋                                                                       | 66/468 [1:38:47<9:51:15, 88.25s/it]
2025-01-27 01:37:13 - ERROR - stderr -  14%|███████████▉                                                                       | 67/468 [1:40:15<9:50:03, 88.29s/it]
2025-01-27 01:37:13 - ERROR - stderr - 
2025-01-27 01:37:13 - ERROR - stderr - 
2025-01-27 01:37:13 - INFO - stdout - {'loss': 0.5288, 'learning_rate': 2.9035150629725858e-05, 'epoch': 0.14}
2025-01-27 01:37:13 - ERROR - stderr -  14%|███████████▉                                                                       | 67/468 [1:40:15<9:50:03, 88.29s/it]
2025-01-27 01:38:42 - ERROR - stderr -  15%|████████████                                                                       | 68/468 [1:41:44<9:50:34, 88.59s/it]
2025-01-27 01:38:42 - ERROR - stderr - 
2025-01-27 01:38:42 - ERROR - stderr - 
2025-01-27 01:38:42 - INFO - stdout - {'loss': 0.5376, 'learning_rate': 2.8998106870524714e-05, 'epoch': 0.15}
2025-01-27 01:38:42 - ERROR - stderr -  15%|████████████                                                                       | 68/468 [1:41:44<9:50:34, 88.59s/it]
2025-01-27 01:40:11 - ERROR - stderr -  15%|████████████▏                                                                      | 69/468 [1:43:13<9:49:15, 88.61s/it]
2025-01-27 01:40:11 - ERROR - stderr - 
2025-01-27 01:40:11 - ERROR - stderr - 
2025-01-27 01:40:11 - INFO - stdout - {'loss': 0.5104, 'learning_rate': 2.896038986978223e-05, 'epoch': 0.15}
2025-01-27 01:40:11 - ERROR - stderr -  15%|████████████▏                                                                      | 69/468 [1:43:13<9:49:15, 88.61s/it]
2025-01-27 01:41:39 - ERROR - stderr -  15%|████████████▍                                                                      | 70/468 [1:44:41<9:45:43, 88.30s/it]
2025-01-27 01:41:39 - ERROR - stderr - 
2025-01-27 01:41:39 - ERROR - stderr - 
2025-01-27 01:41:39 - INFO - stdout - {'loss': 0.5206, 'learning_rate': 2.8922001441504544e-05, 'epoch': 0.15}
2025-01-27 01:41:39 - ERROR - stderr -  15%|████████████▍                                                                      | 70/468 [1:44:41<9:45:43, 88.30s/it]
2025-01-27 01:43:05 - ERROR - stderr -  15%|████████████▌                                                                      | 71/468 [1:46:07<9:41:28, 87.88s/it]
2025-01-27 01:43:05 - ERROR - stderr - 
2025-01-27 01:43:05 - ERROR - stderr - 
2025-01-27 01:43:05 - INFO - stdout - {'loss': 0.5061, 'learning_rate': 2.888294343199022e-05, 'epoch': 0.15}
2025-01-27 01:43:05 - ERROR - stderr -  15%|████████████▌                                                                      | 71/468 [1:46:07<9:41:28, 87.88s/it]
2025-01-27 01:44:34 - ERROR - stderr -  15%|████████████▊                                                                      | 72/468 [1:47:36<9:40:47, 88.00s/it]
2025-01-27 01:44:34 - ERROR - stderr - 
2025-01-27 01:44:34 - ERROR - stderr - 
2025-01-27 01:44:34 - INFO - stdout - {'loss': 0.5431, 'learning_rate': 2.8843217719741458e-05, 'epoch': 0.15}
2025-01-27 01:44:34 - ERROR - stderr -  15%|████████████▊                                                                      | 72/468 [1:47:36<9:40:47, 88.00s/it]
2025-01-27 01:46:02 - ERROR - stderr -  16%|████████████▉                                                                      | 73/468 [1:49:04<9:40:27, 88.17s/it]
2025-01-27 01:46:02 - ERROR - stderr - 
2025-01-27 01:46:02 - ERROR - stderr - 
2025-01-27 01:46:02 - INFO - stdout - {'loss': 0.5365, 'learning_rate': 2.8802826215373742e-05, 'epoch': 0.16}
2025-01-27 01:46:02 - ERROR - stderr -  16%|████████████▉                                                                      | 73/468 [1:49:04<9:40:27, 88.17s/it]
2025-01-27 01:47:30 - ERROR - stderr -  16%|█████████████                                                                      | 74/468 [1:50:32<9:38:27, 88.09s/it]
2025-01-27 01:47:30 - ERROR - stderr - 
2025-01-27 01:47:30 - ERROR - stderr - 
2025-01-27 01:47:30 - INFO - stdout - {'loss': 0.5281, 'learning_rate': 2.876177086152395e-05, 'epoch': 0.16}
2025-01-27 01:47:30 - ERROR - stderr -  16%|█████████████                                                                      | 74/468 [1:50:32<9:38:27, 88.09s/it]
2025-01-27 01:48:57 - ERROR - stderr -  16%|█████████████▎                                                                     | 75/468 [1:51:59<9:34:40, 87.74s/it]
2025-01-27 01:48:57 - ERROR - stderr - 
2025-01-27 01:48:57 - ERROR - stderr - 
2025-01-27 01:48:57 - INFO - stdout - {'loss': 0.5243, 'learning_rate': 2.8720053632756923e-05, 'epoch': 0.16}
2025-01-27 01:48:57 - ERROR - stderr -  16%|█████████████▎                                                                     | 75/468 [1:51:59<9:34:40, 87.74s/it]
2025-01-27 01:50:26 - ERROR - stderr -  16%|█████████████▍                                                                     | 76/468 [1:53:28<9:35:15, 88.05s/it]
2025-01-27 01:50:26 - ERROR - stderr - 
2025-01-27 01:50:26 - ERROR - stderr - 
2025-01-27 01:50:26 - INFO - stdout - {'loss': 0.5075, 'learning_rate': 2.867767653547051e-05, 'epoch': 0.16}
2025-01-27 01:50:26 - ERROR - stderr -  16%|█████████████▍                                                                     | 76/468 [1:53:28<9:35:15, 88.05s/it]
2025-01-27 01:51:54 - ERROR - stderr -  16%|█████████████▋                                                                     | 77/468 [1:54:56<9:33:38, 88.03s/it]
2025-01-27 01:51:54 - ERROR - stderr - 
2025-01-27 01:51:54 - ERROR - stderr - 
2025-01-27 01:51:54 - INFO - stdout - {'loss': 0.5178, 'learning_rate': 2.8634641607799046e-05, 'epoch': 0.16}
2025-01-27 01:51:54 - ERROR - stderr -  16%|█████████████▋                                                                     | 77/468 [1:54:56<9:33:38, 88.03s/it]
2025-01-27 01:53:23 - ERROR - stderr -  17%|█████████████▊                                                                     | 78/468 [1:56:25<9:33:49, 88.28s/it]
2025-01-27 01:53:23 - ERROR - stderr - 
2025-01-27 01:53:23 - ERROR - stderr - 
2025-01-27 01:53:23 - INFO - stdout - {'loss': 0.5498, 'learning_rate': 2.859095091951534e-05, 'epoch': 0.17}
2025-01-27 01:53:23 - ERROR - stderr -  17%|█████████████▊                                                                     | 78/468 [1:56:25<9:33:49, 88.28s/it]
2025-01-27 01:54:51 - ERROR - stderr -  17%|██████████████                                                                     | 79/468 [1:57:53<9:31:57, 88.22s/it]
2025-01-27 01:54:51 - ERROR - stderr - 
2025-01-27 01:54:51 - ERROR - stderr - 
2025-01-27 01:54:51 - INFO - stdout - {'loss': 0.5254, 'learning_rate': 2.8546606571931142e-05, 'epoch': 0.17}
2025-01-27 01:54:51 - ERROR - stderr -  17%|██████████████                                                                     | 79/468 [1:57:53<9:31:57, 88.22s/it]
2025-01-27 01:56:19 - ERROR - stderr -  17%|██████████████▏                                                                    | 80/468 [1:59:21<9:29:37, 88.09s/it]
2025-01-27 01:56:19 - ERROR - stderr - 
2025-01-27 01:56:19 - ERROR - stderr - 
2025-01-27 01:56:19 - INFO - stdout - {'loss': 0.5181, 'learning_rate': 2.8501610697796044e-05, 'epoch': 0.17}
2025-01-27 01:56:19 - ERROR - stderr -  17%|██████████████▏                                                                    | 80/468 [1:59:21<9:29:37, 88.09s/it]
2025-01-27 01:57:47 - ERROR - stderr -  17%|██████████████▎                                                                    | 81/468 [2:00:49<9:28:50, 88.19s/it]
2025-01-27 01:57:47 - ERROR - stderr - 
2025-01-27 01:57:47 - ERROR - stderr - 
2025-01-27 01:57:47 - INFO - stdout - {'loss': 0.5138, 'learning_rate': 2.8455965461194963e-05, 'epoch': 0.17}
2025-01-27 01:57:47 - ERROR - stderr -  17%|██████████████▎                                                                    | 81/468 [2:00:49<9:28:50, 88.19s/it]
2025-01-27 01:59:15 - ERROR - stderr -  18%|██████████████▌                                                                    | 82/468 [2:02:17<9:26:37, 88.08s/it]
2025-01-27 01:59:15 - ERROR - stderr - 
2025-01-27 01:59:15 - ERROR - stderr - 
2025-01-27 01:59:15 - INFO - stdout - {'loss': 0.5192, 'learning_rate': 2.8409673057443985e-05, 'epoch': 0.17}
2025-01-27 01:59:15 - ERROR - stderr -  18%|██████████████▌                                                                    | 82/468 [2:02:17<9:26:37, 88.08s/it]
2025-01-27 02:00:44 - ERROR - stderr -  18%|██████████████▋                                                                    | 83/468 [2:03:46<9:27:12, 88.40s/it]
2025-01-27 02:00:44 - ERROR - stderr - 
2025-01-27 02:00:44 - ERROR - stderr - 
2025-01-27 02:00:44 - INFO - stdout - {'loss': 0.5205, 'learning_rate': 2.8362735712984848e-05, 'epoch': 0.18}
2025-01-27 02:00:44 - ERROR - stderr -  18%|██████████████▋                                                                    | 83/468 [2:03:46<9:27:12, 88.40s/it]
2025-01-27 02:02:13 - ERROR - stderr -  18%|██████████████▉                                                                    | 84/468 [2:05:15<9:27:57, 88.74s/it]
2025-01-27 02:02:13 - ERROR - stderr - 
2025-01-27 02:02:13 - ERROR - stderr - 
2025-01-27 02:02:13 - INFO - stdout - {'loss': 0.5093, 'learning_rate': 2.8315155685277816e-05, 'epoch': 0.18}
2025-01-27 02:02:13 - ERROR - stderr -  18%|██████████████▉                                                                    | 84/468 [2:05:15<9:27:57, 88.74s/it]
2025-01-27 02:03:44 - ERROR - stderr -  18%|███████████████                                                                    | 85/468 [2:06:46<9:29:16, 89.18s/it]
2025-01-27 02:03:44 - ERROR - stderr - 
2025-01-27 02:03:44 - ERROR - stderr - 
2025-01-27 02:03:44 - INFO - stdout - {'loss': 0.5084, 'learning_rate': 2.8266935262693138e-05, 'epoch': 0.18}
2025-01-27 02:03:44 - ERROR - stderr -  18%|███████████████                                                                    | 85/468 [2:06:46<9:29:16, 89.18s/it]
2025-01-27 02:05:13 - ERROR - stderr -  18%|███████████████▎                                                                   | 86/468 [2:08:15<9:28:13, 89.25s/it]
2025-01-27 02:05:13 - ERROR - stderr - 
2025-01-27 02:05:13 - ERROR - stderr - 
2025-01-27 02:05:13 - INFO - stdout - {'loss': 0.5074, 'learning_rate': 2.821807676440096e-05, 'epoch': 0.18}
2025-01-27 02:05:13 - ERROR - stderr -  18%|███████████████▎                                                                   | 86/468 [2:08:15<9:28:13, 89.25s/it]
2025-01-27 02:06:42 - ERROR - stderr -  19%|███████████████▍                                                                   | 87/468 [2:09:44<9:25:29, 89.05s/it]
2025-01-27 02:06:42 - ERROR - stderr - 
2025-01-27 02:06:42 - ERROR - stderr - 
2025-01-27 02:06:42 - INFO - stdout - {'loss': 0.5122, 'learning_rate': 2.8168582540259803e-05, 'epoch': 0.19}
2025-01-27 02:06:42 - ERROR - stderr -  19%|███████████████▍                                                                   | 87/468 [2:09:44<9:25:29, 89.05s/it]
2025-01-27 02:08:11 - ERROR - stderr -  19%|███████████████▌                                                                   | 88/468 [2:11:13<9:23:35, 88.99s/it]
2025-01-27 02:08:11 - ERROR - stderr - 
2025-01-27 02:08:11 - ERROR - stderr - 
2025-01-27 02:08:11 - INFO - stdout - {'loss': 0.5265, 'learning_rate': 2.811845497070354e-05, 'epoch': 0.19}
2025-01-27 02:08:11 - ERROR - stderr -  19%|███████████████▌                                                                   | 88/468 [2:11:13<9:23:35, 88.99s/it]
2025-01-27 02:09:40 - ERROR - stderr -  19%|███████████████▊                                                                   | 89/468 [2:12:42<9:22:24, 89.04s/it]
2025-01-27 02:09:40 - ERROR - stderr - 
2025-01-27 02:09:40 - ERROR - stderr - 
2025-01-27 02:09:40 - INFO - stdout - {'loss': 0.523, 'learning_rate': 2.806769646662691e-05, 'epoch': 0.19}
2025-01-27 02:09:40 - ERROR - stderr -  19%|███████████████▊                                                                   | 89/468 [2:12:42<9:22:24, 89.04s/it]
2025-01-27 02:11:09 - ERROR - stderr -  19%|███████████████▉                                                                   | 90/468 [2:14:11<9:20:47, 89.01s/it]
2025-01-27 02:11:09 - ERROR - stderr - 
2025-01-27 02:11:09 - ERROR - stderr - 
2025-01-27 02:11:09 - INFO - stdout - {'loss': 0.5117, 'learning_rate': 2.801630946926956e-05, 'epoch': 0.19}
2025-01-27 02:11:09 - ERROR - stderr -  19%|███████████████▉                                                                   | 90/468 [2:14:11<9:20:47, 89.01s/it]
2025-01-27 02:12:37 - ERROR - stderr -  19%|████████████████▏                                                                  | 91/468 [2:15:39<9:18:49, 88.94s/it]
2025-01-27 02:12:37 - ERROR - stderr - 
2025-01-27 02:12:37 - ERROR - stderr - 
2025-01-27 02:12:37 - INFO - stdout - {'loss': 0.5034, 'learning_rate': 2.7964296450098646e-05, 'epoch': 0.19}
2025-01-27 02:12:37 - ERROR - stderr -  19%|████████████████▏                                                                  | 91/468 [2:15:39<9:18:49, 88.94s/it]
2025-01-27 02:14:07 - ERROR - stderr -  20%|████████████████▎                                                                  | 92/468 [2:17:09<9:19:30, 89.28s/it]
2025-01-27 02:14:08 - ERROR - stderr - 
2025-01-27 02:14:08 - ERROR - stderr - 
2025-01-27 02:14:08 - INFO - stdout - {'loss': 0.5263, 'learning_rate': 2.7911659910689947e-05, 'epoch': 0.2}
2025-01-27 02:14:08 - ERROR - stderr -  20%|████████████████▎                                                                  | 92/468 [2:17:10<9:19:30, 89.28s/it]
2025-01-27 02:15:37 - ERROR - stderr -  20%|████████████████▍                                                                  | 93/468 [2:18:39<9:17:41, 89.23s/it]
2025-01-27 02:15:37 - ERROR - stderr - 
2025-01-27 02:15:37 - ERROR - stderr - 
2025-01-27 02:15:37 - INFO - stdout - {'loss': 0.5161, 'learning_rate': 2.7858402382607577e-05, 'epoch': 0.2}
2025-01-27 02:15:37 - ERROR - stderr -  20%|████████████████▍                                                                  | 93/468 [2:18:39<9:17:41, 89.23s/it]
2025-01-27 02:17:05 - ERROR - stderr -  20%|████████████████▋                                                                  | 94/468 [2:20:07<9:15:08, 89.06s/it]
2025-01-27 02:17:05 - ERROR - stderr - 
2025-01-27 02:17:05 - ERROR - stderr - 
2025-01-27 02:17:05 - INFO - stdout - {'loss': 0.484, 'learning_rate': 2.78045264272822e-05, 'epoch': 0.2}
2025-01-27 02:17:05 - ERROR - stderr -  20%|████████████████▋                                                                  | 94/468 [2:20:07<9:15:08, 89.06s/it]
2025-01-27 02:18:35 - ERROR - stderr -  20%|████████████████▊                                                                  | 95/468 [2:21:37<9:14:22, 89.18s/it]
2025-01-27 02:18:35 - ERROR - stderr - 
2025-01-27 02:18:35 - ERROR - stderr - 
2025-01-27 02:18:35 - INFO - stdout - {'loss': 0.4986, 'learning_rate': 2.7750034635887874e-05, 'epoch': 0.2}
2025-01-27 02:18:35 - ERROR - stderr -  20%|████████████████▊                                                                  | 95/468 [2:21:37<9:14:22, 89.18s/it]
2025-01-27 02:20:09 - ERROR - stderr -  21%|█████████████████                                                                  | 96/468 [2:23:11<9:21:38, 90.59s/it]
2025-01-27 02:20:09 - ERROR - stderr - 
2025-01-27 02:20:09 - ERROR - stderr - 
2025-01-27 02:20:09 - INFO - stdout - {'loss': 0.5019, 'learning_rate': 2.769492962921738e-05, 'epoch': 0.2}
2025-01-27 02:20:09 - ERROR - stderr -  21%|█████████████████                                                                  | 96/468 [2:23:11<9:21:38, 90.59s/it]
2025-01-27 02:21:47 - ERROR - stderr -  21%|█████████████████▏                                                                 | 97/468 [2:24:49<9:33:45, 92.79s/it]
2025-01-27 02:21:47 - ERROR - stderr - 
2025-01-27 02:21:47 - ERROR - stderr - 
2025-01-27 02:21:47 - INFO - stdout - {'loss': 0.5266, 'learning_rate': 2.7639214057556227e-05, 'epoch': 0.21}
2025-01-27 02:21:47 - ERROR - stderr -  21%|█████████████████▏                                                                 | 97/468 [2:24:49<9:33:45, 92.79s/it]
2025-01-27 02:23:14 - ERROR - stderr -  21%|█████████████████▍                                                                 | 98/468 [2:26:16<9:22:15, 91.18s/it]
2025-01-27 02:23:14 - ERROR - stderr - 
2025-01-27 02:23:14 - ERROR - stderr - 
2025-01-27 02:23:14 - INFO - stdout - {'loss': 0.5144, 'learning_rate': 2.758289060055514e-05, 'epoch': 0.21}
2025-01-27 02:23:14 - ERROR - stderr -  21%|█████████████████▍                                                                 | 98/468 [2:26:16<9:22:15, 91.18s/it]
2025-01-27 02:24:44 - ERROR - stderr -  21%|█████████████████▌                                                                 | 99/468 [2:27:46<9:18:51, 90.87s/it]
2025-01-27 02:24:44 - ERROR - stderr - 
2025-01-27 02:24:44 - ERROR - stderr - 
2025-01-27 02:24:44 - INFO - stdout - {'loss': 0.5084, 'learning_rate': 2.7525961967101216e-05, 'epoch': 0.21}
2025-01-27 02:24:44 - ERROR - stderr -  21%|█████████████████▌                                                                 | 99/468 [2:27:46<9:18:51, 90.87s/it]
2025-01-27 02:26:14 - ERROR - stderr -  21%|█████████████████▌                                                                | 100/468 [2:29:16<9:14:54, 90.47s/it]
2025-01-27 02:26:14 - ERROR - stderr - 
2025-01-27 02:26:14 - ERROR - stderr - 
2025-01-27 02:26:14 - INFO - stdout - {'loss': 0.5138, 'learning_rate': 2.7468430895187616e-05, 'epoch': 0.21}
2025-01-27 02:26:14 - ERROR - stderr -  21%|█████████████████▌                                                                | 100/468 [2:29:16<9:14:54, 90.47s/it]
2025-01-27 02:27:43 - ERROR - stderr -  22%|█████████████████▋                                                                | 101/468 [2:30:45<9:10:38, 90.02s/it]
2025-01-27 02:27:43 - ERROR - stderr - 
2025-01-27 02:27:43 - ERROR - stderr - 
2025-01-27 02:27:43 - INFO - stdout - {'loss': 0.5159, 'learning_rate': 2.741030015178189e-05, 'epoch': 0.22}
2025-01-27 02:27:43 - ERROR - stderr -  22%|█████████████████▋                                                                | 101/468 [2:30:45<9:10:38, 90.02s/it]
2025-01-27 02:29:13 - ERROR - stderr -  22%|█████████████████▊                                                                | 102/468 [2:32:15<9:09:43, 90.12s/it]
2025-01-27 02:29:13 - ERROR - stderr - 
2025-01-27 02:29:13 - ERROR - stderr - 
2025-01-27 02:29:13 - INFO - stdout - {'loss': 0.5136, 'learning_rate': 2.7351572532692916e-05, 'epoch': 0.22}
2025-01-27 02:29:13 - ERROR - stderr -  22%|█████████████████▊                                                                | 102/468 [2:32:15<9:09:43, 90.12s/it]
2025-01-27 02:30:42 - ERROR - stderr -  22%|██████████████████                                                                | 103/468 [2:33:44<9:06:19, 89.81s/it]
2025-01-27 02:30:42 - ERROR - stderr - 
2025-01-27 02:30:42 - ERROR - stderr - 
2025-01-27 02:30:42 - INFO - stdout - {'loss': 0.5048, 'learning_rate': 2.7292250862436397e-05, 'epoch': 0.22}
2025-01-27 02:30:42 - ERROR - stderr -  22%|██████████████████                                                                | 103/468 [2:33:44<9:06:19, 89.81s/it]
2025-01-27 02:32:11 - ERROR - stderr -  22%|██████████████████▏                                                               | 104/468 [2:35:13<9:04:03, 89.68s/it]
2025-01-27 02:32:11 - ERROR - stderr - 
2025-01-27 02:32:11 - ERROR - stderr - 
2025-01-27 02:32:11 - INFO - stdout - {'loss': 0.5153, 'learning_rate': 2.7232337994099044e-05, 'epoch': 0.22}
2025-01-27 02:32:11 - ERROR - stderr -  22%|██████████████████▏                                                               | 104/468 [2:35:13<9:04:03, 89.68s/it]
2025-01-27 02:33:41 - ERROR - stderr -  22%|██████████████████▍                                                               | 105/468 [2:36:43<9:02:02, 89.59s/it]
2025-01-27 02:33:41 - ERROR - stderr - 
2025-01-27 02:33:41 - ERROR - stderr - 
2025-01-27 02:33:41 - INFO - stdout - {'loss': 0.5041, 'learning_rate': 2.7171836809201357e-05, 'epoch': 0.22}
2025-01-27 02:33:41 - ERROR - stderr -  22%|██████████████████▍                                                               | 105/468 [2:36:43<9:02:02, 89.59s/it]
2025-01-27 02:35:11 - ERROR - stderr -  23%|██████████████████▌                                                               | 106/468 [2:38:13<9:01:38, 89.78s/it]
2025-01-27 02:35:11 - ERROR - stderr - 
2025-01-27 02:35:11 - ERROR - stderr - 
2025-01-27 02:35:11 - INFO - stdout - {'loss': 0.5017, 'learning_rate': 2.711075021755902e-05, 'epoch': 0.23}
2025-01-27 02:35:11 - ERROR - stderr -  23%|██████████████████▌                                                               | 106/468 [2:38:13<9:01:38, 89.78s/it]
2025-01-27 02:36:41 - ERROR - stderr -  23%|██████████████████▋                                                               | 107/468 [2:39:43<9:00:02, 89.76s/it]
2025-01-27 02:36:41 - ERROR - stderr - 
2025-01-27 02:36:41 - ERROR - stderr - 
2025-01-27 02:36:41 - INFO - stdout - {'loss': 0.5295, 'learning_rate': 2.704908115714297e-05, 'epoch': 0.23}
2025-01-27 02:36:41 - ERROR - stderr -  23%|██████████████████▋                                                               | 107/468 [2:39:43<9:00:02, 89.76s/it]
2025-01-27 02:38:10 - ERROR - stderr -  23%|██████████████████▉                                                               | 108/468 [2:41:12<8:58:09, 89.69s/it]
2025-01-27 02:38:10 - ERROR - stderr - 
2025-01-27 02:38:10 - ERROR - stderr - 
2025-01-27 02:38:10 - INFO - stdout - {'loss': 0.5036, 'learning_rate': 2.6986832593938088e-05, 'epoch': 0.23}
2025-01-27 02:38:10 - ERROR - stderr -  23%|██████████████████▉                                                               | 108/468 [2:41:12<8:58:09, 89.69s/it]
2025-01-27 02:39:40 - ERROR - stderr -  23%|███████████████████                                                               | 109/468 [2:42:42<8:57:04, 89.76s/it]
2025-01-27 02:39:40 - ERROR - stderr - 
2025-01-27 02:39:40 - ERROR - stderr - 
2025-01-27 02:39:40 - INFO - stdout - {'loss': 0.5113, 'learning_rate': 2.6924007521800533e-05, 'epoch': 0.23}
2025-01-27 02:39:40 - ERROR - stderr -  23%|███████████████████                                                               | 109/468 [2:42:42<8:57:04, 89.76s/it]
2025-01-27 02:41:10 - ERROR - stderr -  24%|███████████████████▎                                                              | 110/468 [2:44:12<8:55:12, 89.70s/it]
2025-01-27 02:41:10 - ERROR - stderr - 
2025-01-27 02:41:10 - ERROR - stderr - 
2025-01-27 02:41:10 - INFO - stdout - {'loss': 0.5024, 'learning_rate': 2.686060896231379e-05, 'epoch': 0.23}
2025-01-27 02:41:10 - ERROR - stderr -  24%|███████████████████▎                                                              | 110/468 [2:44:12<8:55:12, 89.70s/it]
2025-01-27 02:42:39 - ERROR - stderr -  24%|███████████████████▍                                                              | 111/468 [2:45:41<8:52:42, 89.53s/it]
2025-01-27 02:42:39 - ERROR - stderr - 
2025-01-27 02:42:39 - ERROR - stderr - 
2025-01-27 02:42:39 - INFO - stdout - {'loss': 0.5115, 'learning_rate': 2.6796639964643306e-05, 'epoch': 0.24}
2025-01-27 02:42:39 - ERROR - stderr -  24%|███████████████████▍                                                              | 111/468 [2:45:41<8:52:42, 89.53s/it]
2025-01-27 02:44:09 - ERROR - stderr -  24%|███████████████████▌                                                              | 112/468 [2:47:11<8:51:43, 89.62s/it]
2025-01-27 02:44:09 - ERROR - stderr - 
2025-01-27 02:44:09 - ERROR - stderr - 
2025-01-27 02:44:09 - INFO - stdout - {'loss': 0.5231, 'learning_rate': 2.6732103605389876e-05, 'epoch': 0.24}
2025-01-27 02:44:09 - ERROR - stderr -  24%|███████████████████▌                                                              | 112/468 [2:47:11<8:51:43, 89.62s/it]
2025-01-27 02:45:38 - ERROR - stderr -  24%|███████████████████▊                                                              | 113/468 [2:48:40<8:49:10, 89.44s/it]
2025-01-27 02:45:38 - ERROR - stderr - 
2025-01-27 02:45:38 - ERROR - stderr - 
2025-01-27 02:45:38 - INFO - stdout - {'loss': 0.5252, 'learning_rate': 2.6667002988441638e-05, 'epoch': 0.24}
2025-01-27 02:45:38 - ERROR - stderr -  24%|███████████████████▊                                                              | 113/468 [2:48:40<8:49:10, 89.44s/it]
2025-01-27 02:47:08 - ERROR - stderr -  24%|███████████████████▉                                                              | 114/468 [2:50:10<8:48:35, 89.59s/it]
2025-01-27 02:47:08 - ERROR - stderr - 
2025-01-27 02:47:08 - ERROR - stderr - 
2025-01-27 02:47:08 - INFO - stdout - {'loss': 0.5157, 'learning_rate': 2.660134124482482e-05, 'epoch': 0.24}
2025-01-27 02:47:08 - ERROR - stderr -  24%|███████████████████▉                                                              | 114/468 [2:50:10<8:48:35, 89.59s/it]
2025-01-27 02:48:36 - ERROR - stderr -  25%|████████████████████▏                                                             | 115/468 [2:51:38<8:44:59, 89.23s/it]
2025-01-27 02:48:36 - ERROR - stderr - 
2025-01-27 02:48:36 - ERROR - stderr - 
2025-01-27 02:48:36 - INFO - stdout - {'loss': 0.5136, 'learning_rate': 2.6535121532553135e-05, 'epoch': 0.25}
2025-01-27 02:48:36 - ERROR - stderr -  25%|████████████████████▏                                                             | 115/468 [2:51:38<8:44:59, 89.23s/it]
2025-01-27 02:50:05 - ERROR - stderr -  25%|████████████████████▎                                                             | 116/468 [2:53:07<8:42:58, 89.14s/it]
2025-01-27 02:50:05 - ERROR - stderr - 
2025-01-27 02:50:05 - ERROR - stderr - 
2025-01-27 02:50:05 - INFO - stdout - {'loss': 0.5108, 'learning_rate': 2.6468347036475902e-05, 'epoch': 0.25}
2025-01-27 02:50:05 - ERROR - stderr -  25%|████████████████████▎                                                             | 116/468 [2:53:07<8:42:58, 89.14s/it]
2025-01-27 02:51:35 - ERROR - stderr -  25%|████████████████████▌                                                             | 117/468 [2:54:37<8:43:16, 89.45s/it]
2025-01-27 02:51:35 - ERROR - stderr - 
2025-01-27 02:51:35 - ERROR - stderr - 
2025-01-27 02:51:35 - INFO - stdout - {'loss': 0.5173, 'learning_rate': 2.6401020968124874e-05, 'epoch': 0.25}
2025-01-27 02:51:35 - ERROR - stderr -  25%|████████████████████▌                                                             | 117/468 [2:54:37<8:43:16, 89.45s/it]
2025-01-27 02:53:05 - ERROR - stderr -  25%|████████████████████▋                                                             | 118/468 [2:56:07<8:42:10, 89.51s/it]
2025-01-27 02:53:05 - ERROR - stderr - 
2025-01-27 02:53:05 - ERROR - stderr - 
2025-01-27 02:53:05 - INFO - stdout - {'loss': 0.5163, 'learning_rate': 2.6333146565559775e-05, 'epoch': 0.25}
2025-01-27 02:53:05 - ERROR - stderr -  25%|████████████████████▋                                                             | 118/468 [2:56:07<8:42:10, 89.51s/it]
2025-01-27 02:54:34 - ERROR - stderr -  25%|████████████████████▊                                                             | 119/468 [2:57:36<8:39:31, 89.32s/it]
2025-01-27 02:54:34 - ERROR - stderr - 
2025-01-27 02:54:34 - ERROR - stderr - 
2025-01-27 02:54:34 - INFO - stdout - {'loss': 0.5105, 'learning_rate': 2.6264727093212554e-05, 'epoch': 0.25}
2025-01-27 02:54:34 - ERROR - stderr -  25%|████████████████████▊                                                             | 119/468 [2:57:36<8:39:31, 89.32s/it]
2025-01-27 02:56:02 - ERROR - stderr -  26%|█████████████████████                                                             | 120/468 [2:59:04<8:37:08, 89.16s/it]
2025-01-27 02:56:02 - ERROR - stderr - 
2025-01-27 02:56:02 - ERROR - stderr - 
2025-01-27 02:56:02 - INFO - stdout - {'loss': 0.5158, 'learning_rate': 2.6195765841730404e-05, 'epoch': 0.26}
2025-01-27 02:56:02 - ERROR - stderr -  26%|█████████████████████                                                             | 120/468 [2:59:04<8:37:08, 89.16s/it]
2025-01-27 02:57:31 - ERROR - stderr -  26%|█████████████████████▏                                                            | 121/468 [3:00:33<8:33:42, 88.83s/it]
2025-01-27 02:57:31 - ERROR - stderr - 
2025-01-27 02:57:31 - ERROR - stderr - 
2025-01-27 02:57:31 - INFO - stdout - {'loss': 0.5165, 'learning_rate': 2.6126266127817483e-05, 'epoch': 0.26}
2025-01-27 02:57:31 - ERROR - stderr -  26%|█████████████████████▏                                                            | 121/468 [3:00:33<8:33:42, 88.83s/it]
2025-01-27 02:58:59 - ERROR - stderr -  26%|█████████████████████▍                                                            | 122/468 [3:02:01<8:31:25, 88.69s/it]
2025-01-27 02:58:59 - ERROR - stderr - 
2025-01-27 02:58:59 - ERROR - stderr - 
2025-01-27 02:58:59 - INFO - stdout - {'loss': 0.5177, 'learning_rate': 2.6056231294075393e-05, 'epoch': 0.26}
2025-01-27 02:58:59 - ERROR - stderr -  26%|█████████████████████▍                                                            | 122/468 [3:02:01<8:31:25, 88.69s/it]
2025-01-27 03:00:27 - ERROR - stderr -  26%|█████████████████████▌                                                            | 123/468 [3:03:29<8:29:04, 88.53s/it]
2025-01-27 03:00:27 - ERROR - stderr - 
2025-01-27 03:00:27 - ERROR - stderr - 
2025-01-27 03:00:27 - INFO - stdout - {'loss': 0.493, 'learning_rate': 2.5985664708842438e-05, 'epoch': 0.26}
2025-01-27 03:00:27 - ERROR - stderr -  26%|█████████████████████▌                                                            | 123/468 [3:03:29<8:29:04, 88.53s/it]
2025-01-27 03:01:56 - ERROR - stderr -  26%|█████████████████████▋                                                            | 124/468 [3:04:58<8:28:00, 88.61s/it]
2025-01-27 03:01:56 - ERROR - stderr - 
2025-01-27 03:01:56 - ERROR - stderr - 
2025-01-27 03:01:56 - INFO - stdout - {'loss': 0.5119, 'learning_rate': 2.5914569766031586e-05, 'epoch': 0.26}
2025-01-27 03:01:56 - ERROR - stderr -  26%|█████████████████████▋                                                            | 124/468 [3:04:58<8:28:00, 88.61s/it]
2025-01-27 03:03:25 - ERROR - stderr -  27%|█████████████████████▉                                                            | 125/468 [3:06:27<8:27:31, 88.78s/it]
2025-01-27 03:03:25 - ERROR - stderr - 
2025-01-27 03:03:25 - ERROR - stderr - 
2025-01-27 03:03:25 - INFO - stdout - {'loss': 0.5003, 'learning_rate': 2.584294988496728e-05, 'epoch': 0.27}
2025-01-27 03:03:25 - ERROR - stderr -  27%|█████████████████████▉                                                            | 125/468 [3:06:27<8:27:31, 88.78s/it]
2025-01-27 03:04:52 - ERROR - stderr -  27%|██████████████████████                                                            | 126/468 [3:07:54<8:23:48, 88.39s/it]
2025-01-27 03:04:52 - ERROR - stderr - 
2025-01-27 03:04:53 - ERROR - stderr - 
2025-01-27 03:04:53 - INFO - stdout - {'loss': 0.5101, 'learning_rate': 2.5770808510220957e-05, 'epoch': 0.27}
2025-01-27 03:04:53 - ERROR - stderr -  27%|██████████████████████                                                            | 126/468 [3:07:55<8:23:48, 88.39s/it]
2025-01-27 03:06:21 - ERROR - stderr -  27%|██████████████████████▎                                                           | 127/468 [3:09:23<8:22:54, 88.49s/it]
2025-01-27 03:06:21 - ERROR - stderr - 
2025-01-27 03:06:21 - ERROR - stderr - 
2025-01-27 03:06:21 - INFO - stdout - {'loss': 0.5188, 'learning_rate': 2.569814911144539e-05, 'epoch': 0.27}
2025-01-27 03:06:21 - ERROR - stderr -  27%|██████████████████████▎                                                           | 127/468 [3:09:23<8:22:54, 88.49s/it]
2025-01-27 03:07:49 - ERROR - stderr -  27%|██████████████████████▍                                                           | 128/468 [3:10:51<8:20:56, 88.40s/it]
2025-01-27 03:07:49 - ERROR - stderr - 
2025-01-27 03:07:49 - ERROR - stderr - 
2025-01-27 03:07:49 - INFO - stdout - {'loss': 0.5102, 'learning_rate': 2.5624975183207813e-05, 'epoch': 0.27}
2025-01-27 03:07:49 - ERROR - stderr -  27%|██████████████████████▍                                                           | 128/468 [3:10:51<8:20:56, 88.40s/it]
2025-01-27 03:09:18 - ERROR - stderr -  28%|██████████████████████▌                                                           | 129/468 [3:12:20<8:19:23, 88.39s/it]
2025-01-27 03:09:18 - ERROR - stderr - 
2025-01-27 03:09:18 - ERROR - stderr - 
2025-01-27 03:09:18 - INFO - stdout - {'loss': 0.5066, 'learning_rate': 2.5551290244821856e-05, 'epoch': 0.28}
2025-01-27 03:09:18 - ERROR - stderr -  28%|██████████████████████▌                                                           | 129/468 [3:12:20<8:19:23, 88.39s/it]
2025-01-27 03:10:46 - ERROR - stderr -  28%|██████████████████████▊                                                           | 130/468 [3:13:48<8:17:34, 88.33s/it]
2025-01-27 03:10:46 - ERROR - stderr - 
2025-01-27 03:10:46 - ERROR - stderr - 
2025-01-27 03:10:46 - INFO - stdout - {'loss': 0.4941, 'learning_rate': 2.547709784017826e-05, 'epoch': 0.28}
2025-01-27 03:10:46 - ERROR - stderr -  28%|██████████████████████▊                                                           | 130/468 [3:13:48<8:17:34, 88.33s/it]
2025-01-27 03:12:16 - ERROR - stderr -  28%|██████████████████████▉                                                           | 131/468 [3:15:18<8:18:20, 88.73s/it]
2025-01-27 03:12:16 - ERROR - stderr - 
2025-01-27 03:12:16 - ERROR - stderr - 
2025-01-27 03:12:16 - INFO - stdout - {'loss': 0.4988, 'learning_rate': 2.5402401537574476e-05, 'epoch': 0.28}
2025-01-27 03:12:16 - ERROR - stderr -  28%|██████████████████████▉                                                           | 131/468 [3:15:18<8:18:20, 88.73s/it]
2025-01-27 03:13:43 - ERROR - stderr -  28%|███████████████████████▏                                                          | 132/468 [3:16:45<8:14:48, 88.36s/it]
2025-01-27 03:13:43 - ERROR - stderr - 
2025-01-27 03:13:43 - ERROR - stderr - 
2025-01-27 03:13:43 - INFO - stdout - {'loss': 0.5019, 'learning_rate': 2.5327204929543e-05, 'epoch': 0.28}
2025-01-27 03:13:43 - ERROR - stderr -  28%|███████████████████████▏                                                          | 132/468 [3:16:45<8:14:48, 88.36s/it]
2025-01-27 03:15:12 - ERROR - stderr -  28%|███████████████████████▎                                                          | 133/468 [3:18:14<8:13:41, 88.42s/it]
2025-01-27 03:15:12 - ERROR - stderr - 
2025-01-27 03:15:12 - ERROR - stderr - 
2025-01-27 03:15:12 - INFO - stdout - {'loss': 0.513, 'learning_rate': 2.5251511632678613e-05, 'epoch': 0.28}
2025-01-27 03:15:12 - ERROR - stderr -  28%|███████████████████████▎                                                          | 133/468 [3:18:14<8:13:41, 88.42s/it]
2025-01-27 03:16:41 - ERROR - stderr -  29%|███████████████████████▍                                                          | 134/468 [3:19:43<8:13:00, 88.56s/it]
2025-01-27 03:16:41 - ERROR - stderr - 
2025-01-27 03:16:41 - ERROR - stderr - 
2025-01-27 03:16:41 - INFO - stdout - {'loss': 0.5045, 'learning_rate': 2.5175325287464444e-05, 'epoch': 0.29}
2025-01-27 03:16:41 - ERROR - stderr -  29%|███████████████████████▍                                                          | 134/468 [3:19:43<8:13:00, 88.56s/it]
2025-01-27 03:18:10 - ERROR - stderr -  29%|███████████████████████▋                                                          | 135/468 [3:21:12<8:12:24, 88.72s/it]
2025-01-27 03:18:10 - ERROR - stderr - 
2025-01-27 03:18:10 - ERROR - stderr - 
2025-01-27 03:18:10 - INFO - stdout - {'loss': 0.5137, 'learning_rate': 2.5098649558096864e-05, 'epoch': 0.29}
2025-01-27 03:18:10 - ERROR - stderr -  29%|███████████████████████▋                                                          | 135/468 [3:21:12<8:12:24, 88.72s/it]
2025-01-27 03:19:38 - ERROR - stderr -  29%|███████████████████████▊                                                          | 136/468 [3:22:40<8:10:04, 88.57s/it]
2025-01-27 03:19:38 - ERROR - stderr - 
2025-01-27 03:19:38 - ERROR - stderr - 
2025-01-27 03:19:38 - INFO - stdout - {'loss': 0.5149, 'learning_rate': 2.5021488132309282e-05, 'epoch': 0.29}
2025-01-27 03:19:38 - ERROR - stderr -  29%|███████████████████████▊                                                          | 136/468 [3:22:40<8:10:04, 88.57s/it]
2025-01-27 03:21:06 - ERROR - stderr -  29%|████████████████████████                                                          | 137/468 [3:24:08<8:08:39, 88.58s/it]
2025-01-27 03:21:07 - ERROR - stderr - 
2025-01-27 03:21:07 - ERROR - stderr - 
2025-01-27 03:21:07 - INFO - stdout - {'loss': 0.5009, 'learning_rate': 2.4943844721194745e-05, 'epoch': 0.29}
2025-01-27 03:21:07 - ERROR - stderr -  29%|████████████████████████                                                          | 137/468 [3:24:09<8:08:39, 88.58s/it]
2025-01-27 03:22:35 - ERROR - stderr -  29%|████████████████████████▏                                                         | 138/468 [3:25:37<8:06:30, 88.46s/it]
2025-01-27 03:22:35 - ERROR - stderr - 
2025-01-27 03:22:35 - ERROR - stderr - 
2025-01-27 03:22:35 - INFO - stdout - {'loss': 0.508, 'learning_rate': 2.486572305902749e-05, 'epoch': 0.29}
2025-01-27 03:22:35 - ERROR - stderr -  29%|████████████████████████▏                                                         | 138/468 [3:25:37<8:06:30, 88.46s/it]
2025-01-27 03:24:03 - ERROR - stderr -  30%|████████████████████████▎                                                         | 139/468 [3:27:05<8:04:52, 88.43s/it]
2025-01-27 03:24:03 - ERROR - stderr - 
2025-01-27 03:24:03 - ERROR - stderr - 
2025-01-27 03:24:03 - INFO - stdout - {'loss': 0.5011, 'learning_rate': 2.4787126903083323e-05, 'epoch': 0.3}
2025-01-27 03:24:03 - ERROR - stderr -  30%|████████████████████████▎                                                         | 139/468 [3:27:05<8:04:52, 88.43s/it]
2025-01-27 03:25:31 - ERROR - stderr -  30%|████████████████████████▌                                                         | 140/468 [3:28:33<8:03:08, 88.38s/it]
2025-01-27 03:25:31 - ERROR - stderr - 
2025-01-27 03:25:31 - ERROR - stderr - 
2025-01-27 03:25:31 - INFO - stdout - {'loss': 0.4966, 'learning_rate': 2.4708060033458908e-05, 'epoch': 0.3}
2025-01-27 03:25:31 - ERROR - stderr -  30%|████████████████████████▌                                                         | 140/468 [3:28:33<8:03:08, 88.38s/it]
2025-01-27 03:27:00 - ERROR - stderr -  30%|████████████████████████▋                                                         | 141/468 [3:30:02<8:02:08, 88.47s/it]
2025-01-27 03:27:00 - ERROR - stderr - 
2025-01-27 03:27:00 - ERROR - stderr - 
2025-01-27 03:27:00 - INFO - stdout - {'loss': 0.498, 'learning_rate': 2.4628526252889985e-05, 'epoch': 0.3}
2025-01-27 03:27:00 - ERROR - stderr -  30%|████████████████████████▋                                                         | 141/468 [3:30:02<8:02:08, 88.47s/it]
2025-01-27 03:28:28 - ERROR - stderr -  30%|████████████████████████▉                                                         | 142/468 [3:31:30<8:00:01, 88.35s/it]
2025-01-27 03:28:28 - ERROR - stderr - 
2025-01-27 03:28:28 - ERROR - stderr - 
2025-01-27 03:28:28 - INFO - stdout - {'loss': 0.5132, 'learning_rate': 2.454852938656845e-05, 'epoch': 0.3}
2025-01-27 03:28:28 - ERROR - stderr -  30%|████████████████████████▉                                                         | 142/468 [3:31:30<8:00:01, 88.35s/it]
2025-01-27 03:29:56 - ERROR - stderr -  31%|█████████████████████████                                                         | 143/468 [3:32:58<7:57:12, 88.10s/it]
2025-01-27 03:29:56 - ERROR - stderr - 
2025-01-27 03:29:56 - ERROR - stderr - 
2025-01-27 03:29:56 - INFO - stdout - {'loss': 0.4918, 'learning_rate': 2.4468073281958393e-05, 'epoch': 0.31}
2025-01-27 03:29:56 - ERROR - stderr -  31%|█████████████████████████                                                         | 143/468 [3:32:58<7:57:12, 88.10s/it]
2025-01-27 03:31:24 - ERROR - stderr -  31%|█████████████████████████▏                                                        | 144/468 [3:34:26<7:56:04, 88.16s/it]
2025-01-27 03:31:24 - ERROR - stderr - 
2025-01-27 03:31:24 - ERROR - stderr - 
2025-01-27 03:31:24 - INFO - stdout - {'loss': 0.5032, 'learning_rate': 2.438716180861106e-05, 'epoch': 0.31}
2025-01-27 03:31:24 - ERROR - stderr -  31%|█████████████████████████▏                                                        | 144/468 [3:34:26<7:56:04, 88.16s/it]
2025-01-27 03:32:51 - ERROR - stderr -  31%|█████████████████████████▍                                                        | 145/468 [3:35:53<7:52:22, 87.75s/it]
2025-01-27 03:32:51 - ERROR - stderr - 
2025-01-27 03:32:51 - ERROR - stderr - 
2025-01-27 03:32:51 - INFO - stdout - {'loss': 0.5182, 'learning_rate': 2.4305798857978756e-05, 'epoch': 0.31}
2025-01-27 03:32:51 - ERROR - stderr -  31%|█████████████████████████▍                                                        | 145/468 [3:35:53<7:52:22, 87.75s/it]
2025-01-27 03:34:18 - ERROR - stderr -  31%|█████████████████████████▌                                                        | 146/468 [3:37:20<7:50:48, 87.73s/it]
2025-01-27 03:34:18 - ERROR - stderr - 
2025-01-27 03:34:18 - ERROR - stderr - 
2025-01-27 03:34:18 - INFO - stdout - {'loss': 0.5097, 'learning_rate': 2.4223988343227638e-05, 'epoch': 0.31}
2025-01-27 03:34:18 - ERROR - stderr -  31%|█████████████████████████▌                                                        | 146/468 [3:37:20<7:50:48, 87.73s/it]
2025-01-27 03:35:47 - ERROR - stderr -  31%|█████████████████████████▊                                                        | 147/468 [3:38:49<7:50:40, 87.98s/it]
2025-01-27 03:35:47 - ERROR - stderr - 
2025-01-27 03:35:47 - ERROR - stderr - 
2025-01-27 03:35:47 - INFO - stdout - {'loss': 0.5133, 'learning_rate': 2.4141734199049564e-05, 'epoch': 0.31}
2025-01-27 03:35:47 - ERROR - stderr -  31%|█████████████████████████▊                                                        | 147/468 [3:38:49<7:50:40, 87.98s/it]
2025-01-27 03:37:15 - ERROR - stderr -  32%|█████████████████████████▉                                                        | 148/468 [3:40:17<7:50:01, 88.13s/it]
2025-01-27 03:37:15 - ERROR - stderr - 
2025-01-27 03:37:15 - ERROR - stderr - 
2025-01-27 03:37:15 - INFO - stdout - {'loss': 0.4896, 'learning_rate': 2.405904038147282e-05, 'epoch': 0.32}
2025-01-27 03:37:15 - ERROR - stderr -  32%|█████████████████████████▉                                                        | 148/468 [3:40:17<7:50:01, 88.13s/it]
2025-01-27 03:38:44 - ERROR - stderr -  32%|██████████████████████████                                                        | 149/468 [3:41:46<7:49:54, 88.38s/it]
2025-01-27 03:38:44 - ERROR - stderr - 
2025-01-27 03:38:44 - ERROR - stderr - 
2025-01-27 03:38:44 - INFO - stdout - {'loss': 0.5178, 'learning_rate': 2.397591086767188e-05, 'epoch': 0.32}
2025-01-27 03:38:44 - ERROR - stderr -  32%|██████████████████████████                                                        | 149/468 [3:41:46<7:49:54, 88.38s/it]
2025-01-27 03:40:12 - ERROR - stderr -  32%|██████████████████████████▎                                                       | 150/468 [3:43:14<7:46:54, 88.10s/it]
2025-01-27 03:40:12 - ERROR - stderr - 
2025-01-27 03:40:12 - ERROR - stderr - 
2025-01-27 03:40:12 - INFO - stdout - {'loss': 0.495, 'learning_rate': 2.3892349655776095e-05, 'epoch': 0.32}
2025-01-27 03:40:12 - ERROR - stderr -  32%|██████████████████████████▎                                                       | 150/468 [3:43:14<7:46:54, 88.10s/it]
2025-01-27 03:41:41 - ERROR - stderr -  32%|██████████████████████████▍                                                       | 151/468 [3:44:43<7:46:36, 88.32s/it]
2025-01-27 03:41:41 - ERROR - stderr - 
2025-01-27 03:41:41 - ERROR - stderr - 
2025-01-27 03:41:41 - INFO - stdout - {'loss': 0.4903, 'learning_rate': 2.3808360764677416e-05, 'epoch': 0.32}
2025-01-27 03:41:41 - ERROR - stderr -  32%|██████████████████████████▍                                                       | 151/468 [3:44:43<7:46:36, 88.32s/it]
2025-01-27 03:43:09 - ERROR - stderr -  32%|██████████████████████████▋                                                       | 152/468 [3:46:11<7:44:54, 88.27s/it]
2025-01-27 03:43:09 - ERROR - stderr - 
2025-01-27 03:43:09 - ERROR - stderr - 
2025-01-27 03:43:09 - INFO - stdout - {'loss': 0.5125, 'learning_rate': 2.3723948233837116e-05, 'epoch': 0.32}
2025-01-27 03:43:09 - ERROR - stderr -  32%|██████████████████████████▋                                                       | 152/468 [3:46:11<7:44:54, 88.27s/it]
2025-01-27 03:44:37 - ERROR - stderr -  33%|██████████████████████████▊                                                       | 153/468 [3:47:39<7:42:46, 88.15s/it]
2025-01-27 03:44:37 - ERROR - stderr - 
2025-01-27 03:44:37 - ERROR - stderr - 
2025-01-27 03:44:37 - INFO - stdout - {'loss': 0.4956, 'learning_rate': 2.363911612309149e-05, 'epoch': 0.33}
2025-01-27 03:44:37 - ERROR - stderr -  33%|██████████████████████████▊                                                       | 153/468 [3:47:39<7:42:46, 88.15s/it]
2025-01-27 03:46:04 - ERROR - stderr -  33%|██████████████████████████▉                                                       | 154/468 [3:49:06<7:39:42, 87.84s/it]
2025-01-27 03:46:04 - ERROR - stderr - 
2025-01-27 03:46:04 - ERROR - stderr - 
2025-01-27 03:46:04 - INFO - stdout - {'loss': 0.5071, 'learning_rate': 2.3553868512456604e-05, 'epoch': 0.33}
2025-01-27 03:46:04 - ERROR - stderr -  33%|██████████████████████████▉                                                       | 154/468 [3:49:06<7:39:42, 87.84s/it]
2025-01-27 03:47:32 - ERROR - stderr -  33%|███████████████████████████▏                                                      | 155/468 [3:50:34<7:39:26, 88.07s/it]
2025-01-27 03:47:32 - ERROR - stderr - 
2025-01-27 03:47:32 - ERROR - stderr - 
2025-01-27 03:47:32 - INFO - stdout - {'loss': 0.5151, 'learning_rate': 2.346820950193208e-05, 'epoch': 0.33}
2025-01-27 03:47:32 - ERROR - stderr -  33%|███████████████████████████▏                                                      | 155/468 [3:50:34<7:39:26, 88.07s/it]
2025-01-27 03:48:59 - ERROR - stderr -  33%|███████████████████████████▎                                                      | 156/468 [3:52:01<7:36:30, 87.79s/it]
2025-01-27 03:49:00 - ERROR - stderr - 
2025-01-27 03:49:00 - ERROR - stderr - 
2025-01-27 03:49:00 - INFO - stdout - {'loss': 0.5136, 'learning_rate': 2.3382143211303894e-05, 'epoch': 0.33}
2025-01-27 03:49:00 - ERROR - stderr -  33%|███████████████████████████▎                                                      | 156/468 [3:52:02<7:36:30, 87.79s/it]
2025-01-27 03:50:27 - ERROR - stderr -  34%|███████████████████████████▌                                                      | 157/468 [3:53:29<7:35:09, 87.81s/it]
2025-01-27 03:50:27 - ERROR - stderr - 
2025-01-27 03:50:27 - ERROR - stderr - 
2025-01-27 03:50:27 - INFO - stdout - {'loss': 0.5037, 'learning_rate': 2.3295673779946207e-05, 'epoch': 0.33}
2025-01-27 03:50:27 - ERROR - stderr -  34%|███████████████████████████▌                                                      | 157/468 [3:53:29<7:35:09, 87.81s/it]
2025-01-27 03:51:56 - ERROR - stderr -  34%|███████████████████████████▋                                                      | 158/468 [3:54:58<7:34:56, 88.05s/it]
2025-01-27 03:51:56 - ERROR - stderr - 
2025-01-27 03:51:56 - ERROR - stderr - 
2025-01-27 03:51:56 - INFO - stdout - {'loss': 0.5074, 'learning_rate': 2.3208805366622342e-05, 'epoch': 0.34}
2025-01-27 03:51:56 - ERROR - stderr -  34%|███████████████████████████▋                                                      | 158/468 [3:54:58<7:34:56, 88.05s/it]
2025-01-27 03:53:23 - ERROR - stderr -  34%|███████████████████████████▊                                                      | 159/468 [3:56:25<7:32:26, 87.85s/it]
2025-01-27 03:53:24 - ERROR - stderr - 
2025-01-27 03:53:24 - ERROR - stderr - 
2025-01-27 03:53:24 - INFO - stdout - {'loss': 0.5107, 'learning_rate': 2.3121542149284712e-05, 'epoch': 0.34}
2025-01-27 03:53:24 - ERROR - stderr -  34%|███████████████████████████▊                                                      | 159/468 [3:56:26<7:32:26, 87.85s/it]
2025-01-27 03:54:51 - ERROR - stderr -  34%|████████████████████████████                                                      | 160/468 [3:57:53<7:31:19, 87.92s/it]
2025-01-27 03:54:51 - ERROR - stderr - 
2025-01-27 03:54:51 - ERROR - stderr - 
2025-01-27 03:54:51 - INFO - stdout - {'loss': 0.4903, 'learning_rate': 2.303388832487391e-05, 'epoch': 0.34}
2025-01-27 03:54:51 - ERROR - stderr -  34%|████████████████████████████                                                      | 160/468 [3:57:53<7:31:19, 87.92s/it]
2025-01-27 03:56:19 - ERROR - stderr -  34%|████████████████████████████▏                                                     | 161/468 [3:59:21<7:30:03, 87.96s/it]
2025-01-27 03:56:20 - ERROR - stderr - 
2025-01-27 03:56:20 - ERROR - stderr - 
2025-01-27 03:56:20 - INFO - stdout - {'loss': 0.4975, 'learning_rate': 2.294584810911686e-05, 'epoch': 0.34}
2025-01-27 03:56:20 - ERROR - stderr -  34%|████████████████████████████▏                                                     | 161/468 [3:59:22<7:30:03, 87.96s/it]
2025-01-27 03:57:48 - ERROR - stderr -  35%|████████████████████████████▍                                                     | 162/468 [4:00:50<7:29:51, 88.21s/it]
2025-01-27 03:57:48 - ERROR - stderr - 
2025-01-27 03:57:48 - ERROR - stderr - 
2025-01-27 03:57:48 - INFO - stdout - {'loss': 0.4947, 'learning_rate': 2.2857425736324024e-05, 'epoch': 0.35}
2025-01-27 03:57:48 - ERROR - stderr -  35%|████████████████████████████▍                                                     | 162/468 [4:00:50<7:29:51, 88.21s/it]
2025-01-27 03:59:16 - ERROR - stderr -  35%|████████████████████████████▌                                                     | 163/468 [4:02:18<7:28:01, 88.13s/it]
2025-01-27 03:59:16 - ERROR - stderr - 
2025-01-27 03:59:16 - ERROR - stderr - 
2025-01-27 03:59:16 - INFO - stdout - {'loss': 0.4999, 'learning_rate': 2.276862545918579e-05, 'epoch': 0.35}
2025-01-27 03:59:16 - ERROR - stderr -  35%|████████████████████████████▌                                                     | 163/468 [4:02:18<7:28:01, 88.13s/it]
2025-01-27 04:00:38 - INFO - stdout - WARNING: tokenization mismatch: 1 vs. 1310. (ignored)
2025-01-27 04:00:44 - ERROR - stderr -  35%|████████████████████████████▋                                                     | 164/468 [4:03:46<7:25:53, 88.01s/it]
2025-01-27 04:00:44 - ERROR - stderr - 
2025-01-27 04:00:44 - ERROR - stderr - 
2025-01-27 04:00:44 - INFO - stdout - {'loss': 0.5, 'learning_rate': 2.267945154856793e-05, 'epoch': 0.35}
2025-01-27 04:00:44 - ERROR - stderr -  35%|████████████████████████████▋                                                     | 164/468 [4:03:46<7:25:53, 88.01s/it]
2025-01-27 04:02:12 - ERROR - stderr -  35%|████████████████████████████▉                                                     | 165/468 [4:05:14<7:24:51, 88.09s/it]
2025-01-27 04:02:12 - ERROR - stderr - 
2025-01-27 04:02:12 - ERROR - stderr - 
2025-01-27 04:02:12 - INFO - stdout - {'loss': 0.4934, 'learning_rate': 2.258990829330619e-05, 'epoch': 0.35}
2025-01-27 04:02:12 - ERROR - stderr -  35%|████████████████████████████▉                                                     | 165/468 [4:05:14<7:24:51, 88.09s/it]
2025-01-27 04:03:40 - ERROR - stderr -  35%|█████████████████████████████                                                     | 166/468 [4:06:42<7:23:14, 88.06s/it]
2025-01-27 04:03:40 - ERROR - stderr - 
2025-01-27 04:03:40 - ERROR - stderr - 
2025-01-27 04:03:40 - INFO - stdout - {'loss': 0.511, 'learning_rate': 2.25e-05, 'epoch': 0.35}
2025-01-27 04:03:40 - ERROR - stderr -  35%|█████████████████████████████                                                     | 166/468 [4:06:42<7:23:14, 88.06s/it]
2025-01-27 04:05:09 - ERROR - stderr -  36%|█████████████████████████████▎                                                    | 167/468 [4:08:11<7:22:17, 88.16s/it]
2025-01-27 04:05:09 - ERROR - stderr - 
2025-01-27 04:05:09 - ERROR - stderr - 
2025-01-27 04:05:09 - INFO - stdout - {'loss': 0.5291, 'learning_rate': 2.2409730992805378e-05, 'epoch': 0.36}
2025-01-27 04:05:09 - ERROR - stderr -  36%|█████████████████████████████▎                                                    | 167/468 [4:08:11<7:22:17, 88.16s/it]
2025-01-27 04:06:36 - ERROR - stderr -  36%|█████████████████████████████▍                                                    | 168/468 [4:09:38<7:20:16, 88.05s/it]
2025-01-27 04:06:37 - ERROR - stderr - 
2025-01-27 04:06:37 - ERROR - stderr - 
2025-01-27 04:06:37 - INFO - stdout - {'loss': 0.5124, 'learning_rate': 2.2319105613226925e-05, 'epoch': 0.36}
2025-01-27 04:06:37 - ERROR - stderr -  36%|█████████████████████████████▍                                                    | 168/468 [4:09:39<7:20:16, 88.05s/it]
2025-01-27 04:08:05 - ERROR - stderr -  36%|█████████████████████████████▌                                                    | 169/468 [4:11:07<7:19:43, 88.24s/it]
2025-01-27 04:08:05 - ERROR - stderr - 
2025-01-27 04:08:05 - ERROR - stderr - 
2025-01-27 04:08:05 - INFO - stdout - {'loss': 0.5061, 'learning_rate': 2.2228128219909057e-05, 'epoch': 0.36}
2025-01-27 04:08:05 - ERROR - stderr -  36%|█████████████████████████████▌                                                    | 169/468 [4:11:07<7:19:43, 88.24s/it]
2025-01-27 04:09:33 - ERROR - stderr -  36%|█████████████████████████████▊                                                    | 170/468 [4:12:35<7:18:18, 88.25s/it]
2025-01-27 04:09:33 - ERROR - stderr - 
2025-01-27 04:09:33 - ERROR - stderr - 
2025-01-27 04:09:33 - INFO - stdout - {'loss': 0.509, 'learning_rate': 2.2136803188426344e-05, 'epoch': 0.36}
2025-01-27 04:09:33 - ERROR - stderr -  36%|█████████████████████████████▊                                                    | 170/468 [4:12:35<7:18:18, 88.25s/it]
2025-01-27 04:11:01 - ERROR - stderr -  37%|█████████████████████████████▉                                                    | 171/468 [4:14:03<7:15:12, 87.92s/it]
2025-01-27 04:11:01 - ERROR - stderr - 
2025-01-27 04:11:01 - ERROR - stderr - 
2025-01-27 04:11:01 - INFO - stdout - {'loss': 0.4965, 'learning_rate': 2.204513491107309e-05, 'epoch': 0.36}
2025-01-27 04:11:01 - ERROR - stderr -  37%|█████████████████████████████▉                                                    | 171/468 [4:14:03<7:15:12, 87.92s/it]
2025-01-27 04:12:27 - ERROR - stderr -  37%|██████████████████████████████▏                                                   | 172/468 [4:15:29<7:12:02, 87.58s/it]
2025-01-27 04:12:27 - ERROR - stderr - 
2025-01-27 04:12:27 - ERROR - stderr - 
2025-01-27 04:12:27 - INFO - stdout - {'loss': 0.4923, 'learning_rate': 2.1953127796652057e-05, 'epoch': 0.37}
2025-01-27 04:12:27 - ERROR - stderr -  37%|██████████████████████████████▏                                                   | 172/468 [4:15:29<7:12:02, 87.58s/it]
2025-01-27 04:13:55 - ERROR - stderr -  37%|██████████████████████████████▎                                                   | 173/468 [4:16:57<7:10:37, 87.59s/it]
2025-01-27 04:13:55 - ERROR - stderr - 
2025-01-27 04:13:55 - ERROR - stderr - 
2025-01-27 04:13:55 - INFO - stdout - {'loss': 0.4864, 'learning_rate': 2.1860786270262444e-05, 'epoch': 0.37}
2025-01-27 04:13:55 - ERROR - stderr -  37%|██████████████████████████████▎                                                   | 173/468 [4:16:57<7:10:37, 87.59s/it]
2025-01-27 04:15:22 - ERROR - stderr -  37%|██████████████████████████████▍                                                   | 174/468 [4:18:24<7:08:46, 87.50s/it]
2025-01-27 04:15:22 - ERROR - stderr - 
2025-01-27 04:15:22 - ERROR - stderr - 
2025-01-27 04:15:22 - INFO - stdout - {'loss': 0.496, 'learning_rate': 2.1768114773087063e-05, 'epoch': 0.37}
2025-01-27 04:15:22 - ERROR - stderr -  37%|██████████████████████████████▍                                                   | 174/468 [4:18:24<7:08:46, 87.50s/it]
2025-01-27 04:16:50 - ERROR - stderr -  37%|██████████████████████████████▋                                                   | 175/468 [4:19:52<7:07:03, 87.45s/it]
2025-01-27 04:16:50 - ERROR - stderr - 
2025-01-27 04:16:50 - ERROR - stderr - 
2025-01-27 04:16:50 - INFO - stdout - {'loss': 0.5027, 'learning_rate': 2.167511776217872e-05, 'epoch': 0.37}
2025-01-27 04:16:50 - ERROR - stderr -  37%|██████████████████████████████▋                                                   | 175/468 [4:19:52<7:07:03, 87.45s/it]
2025-01-27 04:18:18 - ERROR - stderr -  38%|██████████████████████████████▊                                                   | 176/468 [4:21:20<7:06:38, 87.67s/it]
2025-01-27 04:18:18 - ERROR - stderr - 
2025-01-27 04:18:18 - ERROR - stderr - 
2025-01-27 04:18:18 - INFO - stdout - {'loss': 0.4988, 'learning_rate': 2.158179971024588e-05, 'epoch': 0.38}
2025-01-27 04:18:18 - ERROR - stderr -  38%|██████████████████████████████▊                                                   | 176/468 [4:21:20<7:06:38, 87.67s/it]
2025-01-27 04:19:46 - ERROR - stderr -  38%|███████████████████████████████                                                   | 177/468 [4:22:48<7:05:31, 87.74s/it]
2025-01-27 04:19:46 - ERROR - stderr - 
2025-01-27 04:19:46 - ERROR - stderr - 
2025-01-27 04:19:46 - INFO - stdout - {'loss': 0.4905, 'learning_rate': 2.1488165105437516e-05, 'epoch': 0.38}
2025-01-27 04:19:46 - ERROR - stderr -  38%|███████████████████████████████                                                   | 177/468 [4:22:48<7:05:31, 87.74s/it]
2025-01-27 04:21:13 - ERROR - stderr -  38%|███████████████████████████████▏                                                  | 178/468 [4:24:15<7:03:05, 87.54s/it]
2025-01-27 04:21:13 - ERROR - stderr - 
2025-01-27 04:21:13 - ERROR - stderr - 
2025-01-27 04:21:13 - INFO - stdout - {'loss': 0.512, 'learning_rate': 2.139421845112729e-05, 'epoch': 0.38}
2025-01-27 04:21:13 - ERROR - stderr -  38%|███████████████████████████████▏                                                  | 178/468 [4:24:15<7:03:05, 87.54s/it]
2025-01-27 04:22:41 - ERROR - stderr -  38%|███████████████████████████████▎                                                  | 179/468 [4:25:43<7:02:04, 87.63s/it]
2025-01-27 04:22:41 - ERROR - stderr - 
2025-01-27 04:22:41 - ERROR - stderr - 
2025-01-27 04:22:41 - INFO - stdout - {'loss': 0.4978, 'learning_rate': 2.1299964265696923e-05, 'epoch': 0.38}
2025-01-27 04:22:41 - ERROR - stderr -  38%|███████████████████████████████▎                                                  | 179/468 [4:25:43<7:02:04, 87.63s/it]
2025-01-27 04:24:09 - ERROR - stderr -  38%|███████████████████████████████▌                                                  | 180/468 [4:27:11<7:01:26, 87.80s/it]
2025-01-27 04:24:09 - ERROR - stderr - 
2025-01-27 04:24:09 - ERROR - stderr - 
2025-01-27 04:24:09 - INFO - stdout - {'loss': 0.4977, 'learning_rate': 2.1205407082318925e-05, 'epoch': 0.38}
2025-01-27 04:24:09 - ERROR - stderr -  38%|███████████████████████████████▌                                                  | 180/468 [4:27:11<7:01:26, 87.80s/it]
2025-01-27 04:25:36 - ERROR - stderr -  39%|███████████████████████████████▋                                                  | 181/468 [4:28:38<6:59:49, 87.77s/it]
2025-01-27 04:25:36 - ERROR - stderr - 
2025-01-27 04:25:36 - ERROR - stderr - 
2025-01-27 04:25:36 - INFO - stdout - {'loss': 0.4985, 'learning_rate': 2.111055144873852e-05, 'epoch': 0.39}
2025-01-27 04:25:36 - ERROR - stderr -  39%|███████████████████████████████▋                                                  | 181/468 [4:28:38<6:59:49, 87.77s/it]
2025-01-27 04:27:04 - ERROR - stderr -  39%|███████████████████████████████▉                                                  | 182/468 [4:30:06<6:57:45, 87.64s/it]
2025-01-27 04:27:04 - ERROR - stderr - 
2025-01-27 04:27:04 - ERROR - stderr - 
2025-01-27 04:27:04 - INFO - stdout - {'loss': 0.5058, 'learning_rate': 2.1015401927054977e-05, 'epoch': 0.39}
2025-01-27 04:27:04 - ERROR - stderr -  39%|███████████████████████████████▉                                                  | 182/468 [4:30:06<6:57:45, 87.64s/it]
2025-01-27 04:28:31 - ERROR - stderr -  39%|████████████████████████████████                                                  | 183/468 [4:31:33<6:56:09, 87.61s/it]
2025-01-27 04:28:31 - ERROR - stderr - 
2025-01-27 04:28:31 - ERROR - stderr - 
2025-01-27 04:28:31 - INFO - stdout - {'loss': 0.4959, 'learning_rate': 2.0919963093502146e-05, 'epoch': 0.39}
2025-01-27 04:28:31 - ERROR - stderr -  39%|████████████████████████████████                                                  | 183/468 [4:31:33<6:56:09, 87.61s/it]
2025-01-27 04:29:58 - ERROR - stderr -  39%|████████████████████████████████▏                                                 | 184/468 [4:33:00<6:54:04, 87.48s/it]
2025-01-27 04:29:58 - ERROR - stderr - 
2025-01-27 04:29:58 - ERROR - stderr - 
2025-01-27 04:29:58 - INFO - stdout - {'loss': 0.5018, 'learning_rate': 2.0824239538228404e-05, 'epoch': 0.39}
2025-01-27 04:29:58 - ERROR - stderr -  39%|████████████████████████████████▏                                                 | 184/468 [4:33:00<6:54:04, 87.48s/it]
2025-01-27 04:31:26 - ERROR - stderr -  40%|████████████████████████████████▍                                                 | 185/468 [4:34:28<6:52:19, 87.42s/it]
2025-01-27 04:31:26 - ERROR - stderr - 
2025-01-27 04:31:26 - ERROR - stderr - 
2025-01-27 04:31:26 - INFO - stdout - {'loss': 0.4961, 'learning_rate': 2.0728235865075865e-05, 'epoch': 0.39}
2025-01-27 04:31:26 - ERROR - stderr -  40%|████████████████████████████████▍                                                 | 185/468 [4:34:28<6:52:19, 87.42s/it]
2025-01-27 04:32:53 - ERROR - stderr -  40%|████████████████████████████████▌                                                 | 186/468 [4:35:55<6:51:12, 87.49s/it]
2025-01-27 04:32:54 - ERROR - stderr - 
2025-01-27 04:32:54 - ERROR - stderr - 
2025-01-27 04:32:54 - INFO - stdout - {'loss': 0.4984, 'learning_rate': 2.0631956691358952e-05, 'epoch': 0.4}
2025-01-27 04:32:54 - ERROR - stderr -  40%|████████████████████████████████▌                                                 | 186/468 [4:35:56<6:51:12, 87.49s/it]
2025-01-27 04:34:21 - ERROR - stderr -  40%|████████████████████████████████▊                                                 | 187/468 [4:37:23<6:50:32, 87.66s/it]
2025-01-27 04:34:21 - ERROR - stderr - 
2025-01-27 04:34:21 - ERROR - stderr - 
2025-01-27 04:34:21 - INFO - stdout - {'loss': 0.4878, 'learning_rate': 2.053540664764235e-05, 'epoch': 0.4}
2025-01-27 04:34:21 - ERROR - stderr -  40%|████████████████████████████████▊                                                 | 187/468 [4:37:23<6:50:32, 87.66s/it]
2025-01-27 04:35:48 - ERROR - stderr -  40%|████████████████████████████████▉                                                 | 188/468 [4:38:50<6:47:40, 87.36s/it]
2025-01-27 04:35:48 - ERROR - stderr - 
2025-01-27 04:35:48 - ERROR - stderr - 
2025-01-27 04:35:48 - INFO - stdout - {'loss': 0.5144, 'learning_rate': 2.0438590377518292e-05, 'epoch': 0.4}
2025-01-27 04:35:48 - ERROR - stderr -  40%|████████████████████████████████▉                                                 | 188/468 [4:38:50<6:47:40, 87.36s/it]
2025-01-27 04:37:16 - ERROR - stderr -  40%|█████████████████████████████████                                                 | 189/468 [4:40:18<6:47:26, 87.62s/it]
2025-01-27 04:37:16 - ERROR - stderr - 
2025-01-27 04:37:16 - ERROR - stderr - 
2025-01-27 04:37:16 - INFO - stdout - {'loss': 0.4977, 'learning_rate': 2.0341512537383202e-05, 'epoch': 0.4}
2025-01-27 04:37:16 - ERROR - stderr -  40%|█████████████████████████████████                                                 | 189/468 [4:40:18<6:47:26, 87.62s/it]
2025-01-27 04:38:44 - ERROR - stderr -  41%|█████████████████████████████████▎                                                | 190/468 [4:41:46<6:45:33, 87.53s/it]
2025-01-27 04:38:44 - ERROR - stderr - 
2025-01-27 04:38:44 - ERROR - stderr - 
2025-01-27 04:38:44 - INFO - stdout - {'loss': 0.4948, 'learning_rate': 2.024417779621379e-05, 'epoch': 0.41}
2025-01-27 04:38:44 - ERROR - stderr -  41%|█████████████████████████████████▎                                                | 190/468 [4:41:46<6:45:33, 87.53s/it]
2025-01-27 04:40:11 - ERROR - stderr -  41%|█████████████████████████████████▍                                                | 191/468 [4:43:13<6:43:35, 87.42s/it]
2025-01-27 04:40:11 - ERROR - stderr - 
2025-01-27 04:40:11 - ERROR - stderr - 
2025-01-27 04:40:11 - INFO - stdout - {'loss': 0.4942, 'learning_rate': 2.0146590835342436e-05, 'epoch': 0.41}
2025-01-27 04:40:11 - ERROR - stderr -  41%|█████████████████████████████████▍                                                | 191/468 [4:43:13<6:43:35, 87.42s/it]
2025-01-27 04:41:39 - ERROR - stderr -  41%|█████████████████████████████████▋                                                | 192/468 [4:44:41<6:42:28, 87.49s/it]
2025-01-27 04:41:39 - ERROR - stderr - 
2025-01-27 04:41:39 - ERROR - stderr - 
2025-01-27 04:41:39 - INFO - stdout - {'loss': 0.4892, 'learning_rate': 2.0048756348232097e-05, 'epoch': 0.41}
2025-01-27 04:41:39 - ERROR - stderr -  41%|█████████████████████████████████▋                                                | 192/468 [4:44:41<6:42:28, 87.49s/it]
2025-01-27 04:43:06 - ERROR - stderr -  41%|█████████████████████████████████▊                                                | 193/468 [4:46:08<6:41:35, 87.62s/it]
2025-01-27 04:43:06 - ERROR - stderr - 
2025-01-27 04:43:06 - ERROR - stderr - 
2025-01-27 04:43:06 - INFO - stdout - {'loss': 0.4957, 'learning_rate': 1.9950679040250536e-05, 'epoch': 0.41}
2025-01-27 04:43:06 - ERROR - stderr -  41%|█████████████████████████████████▊                                                | 193/468 [4:46:08<6:41:35, 87.62s/it]
2025-01-27 04:44:34 - ERROR - stderr -  41%|█████████████████████████████████▉                                                | 194/468 [4:47:36<6:40:35, 87.72s/it]
2025-01-27 04:44:34 - ERROR - stderr - 
2025-01-27 04:44:34 - ERROR - stderr - 
2025-01-27 04:44:34 - INFO - stdout - {'loss': 0.5053, 'learning_rate': 1.9852363628444042e-05, 'epoch': 0.41}
2025-01-27 04:44:34 - ERROR - stderr -  41%|█████████████████████████████████▉                                                | 194/468 [4:47:36<6:40:35, 87.72s/it]
2025-01-27 04:46:02 - ERROR - stderr -  42%|██████████████████████████████████▏                                               | 195/468 [4:49:04<6:39:08, 87.72s/it]
2025-01-27 04:46:02 - ERROR - stderr - 
2025-01-27 04:46:02 - ERROR - stderr - 
2025-01-27 04:46:02 - INFO - stdout - {'loss': 0.4974, 'learning_rate': 1.9753814841310544e-05, 'epoch': 0.42}
2025-01-27 04:46:02 - ERROR - stderr -  42%|██████████████████████████████████▏                                               | 195/468 [4:49:04<6:39:08, 87.72s/it]
2025-01-27 04:47:30 - ERROR - stderr -  42%|██████████████████████████████████▎                                               | 196/468 [4:50:32<6:37:45, 87.74s/it]
2025-01-27 04:47:30 - ERROR - stderr - 
2025-01-27 04:47:30 - ERROR - stderr - 
2025-01-27 04:47:30 - INFO - stdout - {'loss': 0.4973, 'learning_rate': 1.9655037418572202e-05, 'epoch': 0.42}
2025-01-27 04:47:30 - ERROR - stderr -  42%|██████████████████████████████████▎                                               | 196/468 [4:50:32<6:37:45, 87.74s/it]
2025-01-27 04:48:57 - ERROR - stderr -  42%|██████████████████████████████████▌                                               | 197/468 [4:51:59<6:35:30, 87.56s/it]
2025-01-27 04:48:57 - ERROR - stderr - 
2025-01-27 04:48:57 - ERROR - stderr - 
2025-01-27 04:48:57 - INFO - stdout - {'loss': 0.4996, 'learning_rate': 1.955603611094745e-05, 'epoch': 0.42}
2025-01-27 04:48:57 - ERROR - stderr -  42%|██████████████████████████████████▌                                               | 197/468 [4:51:59<6:35:30, 87.56s/it]
2025-01-27 04:50:25 - ERROR - stderr -  42%|██████████████████████████████████▋                                               | 198/468 [4:53:27<6:34:25, 87.65s/it]
2025-01-27 04:50:25 - ERROR - stderr - 
2025-01-27 04:50:25 - ERROR - stderr - 
2025-01-27 04:50:25 - INFO - stdout - {'loss': 0.5048, 'learning_rate': 1.9456815679922512e-05, 'epoch': 0.42}
2025-01-27 04:50:25 - ERROR - stderr -  42%|██████████████████████████████████▋                                               | 198/468 [4:53:27<6:34:25, 87.65s/it]
2025-01-27 04:51:52 - ERROR - stderr -  43%|██████████████████████████████████▊                                               | 199/468 [4:54:54<6:32:01, 87.44s/it]
2025-01-27 04:51:52 - ERROR - stderr - 
2025-01-27 04:51:52 - ERROR - stderr - 
2025-01-27 04:51:52 - INFO - stdout - {'loss': 0.5041, 'learning_rate': 1.9357380897522376e-05, 'epoch': 0.42}
2025-01-27 04:51:52 - ERROR - stderr -  43%|██████████████████████████████████▊                                               | 199/468 [4:54:54<6:32:01, 87.44s/it]
2025-01-27 04:53:19 - ERROR - stderr -  43%|███████████████████████████████████                                               | 200/468 [4:56:21<6:30:01, 87.32s/it]
2025-01-27 04:53:19 - ERROR - stderr - 
2025-01-27 04:53:19 - ERROR - stderr - 
2025-01-27 04:53:19 - INFO - stdout - {'loss': 0.5117, 'learning_rate': 1.925773654608132e-05, 'epoch': 0.43}
2025-01-27 04:53:19 - ERROR - stderr -  43%|███████████████████████████████████                                               | 200/468 [4:56:21<6:30:01, 87.32s/it]
2025-01-27 04:54:47 - ERROR - stderr -  43%|███████████████████████████████████▏                                              | 201/468 [4:57:49<6:29:14, 87.47s/it]
2025-01-27 04:54:47 - ERROR - stderr - 
2025-01-27 04:54:47 - ERROR - stderr - 
2025-01-27 04:54:47 - INFO - stdout - {'loss': 0.4922, 'learning_rate': 1.915788741801286e-05, 'epoch': 0.43}
2025-01-27 04:54:47 - ERROR - stderr -  43%|███████████████████████████████████▏                                              | 201/468 [4:57:49<6:29:14, 87.47s/it]
2025-01-27 04:56:15 - ERROR - stderr -  43%|███████████████████████████████████▍                                              | 202/468 [4:59:17<6:28:53, 87.72s/it]
2025-01-27 04:56:15 - ERROR - stderr - 
2025-01-27 04:56:15 - ERROR - stderr - 
2025-01-27 04:56:15 - INFO - stdout - {'loss': 0.4847, 'learning_rate': 1.9057838315579305e-05, 'epoch': 0.43}
2025-01-27 04:56:15 - ERROR - stderr -  43%|███████████████████████████████████▍                                              | 202/468 [4:59:17<6:28:53, 87.72s/it]
2025-01-27 04:57:43 - ERROR - stderr -  43%|███████████████████████████████████▌                                              | 203/468 [5:00:45<6:28:07, 87.88s/it]
2025-01-27 04:57:43 - ERROR - stderr - 
2025-01-27 04:57:43 - ERROR - stderr - 
2025-01-27 04:57:43 - INFO - stdout - {'loss': 0.4898, 'learning_rate': 1.8957594050660752e-05, 'epoch': 0.43}
2025-01-27 04:57:43 - ERROR - stderr -  43%|███████████████████████████████████▌                                              | 203/468 [5:00:45<6:28:07, 87.88s/it]
2025-01-27 04:59:11 - ERROR - stderr -  44%|███████████████████████████████████▋                                              | 204/468 [5:02:13<6:26:44, 87.89s/it]
2025-01-27 04:59:11 - ERROR - stderr - 
2025-01-27 04:59:11 - ERROR - stderr - 
2025-01-27 04:59:11 - INFO - stdout - {'loss': 0.4801, 'learning_rate': 1.8857159444523688e-05, 'epoch': 0.44}
2025-01-27 04:59:11 - ERROR - stderr -  44%|███████████████████████████████████▋                                              | 204/468 [5:02:13<6:26:44, 87.89s/it]
2025-01-27 05:00:40 - ERROR - stderr -  44%|███████████████████████████████████▉                                              | 205/468 [5:03:42<6:25:49, 88.02s/it]
2025-01-27 05:00:40 - ERROR - stderr - 
2025-01-27 05:00:40 - ERROR - stderr - 
2025-01-27 05:00:40 - INFO - stdout - {'loss': 0.4944, 'learning_rate': 1.8756539327589087e-05, 'epoch': 0.44}
2025-01-27 05:00:40 - ERROR - stderr -  44%|███████████████████████████████████▉                                              | 205/468 [5:03:42<6:25:49, 88.02s/it]
2025-01-27 05:02:07 - ERROR - stderr -  44%|████████████████████████████████████                                              | 206/468 [5:05:09<6:24:18, 88.01s/it]
2025-01-27 05:02:08 - ERROR - stderr - 
2025-01-27 05:02:08 - ERROR - stderr - 
2025-01-27 05:02:08 - INFO - stdout - {'loss': 0.4923, 'learning_rate': 1.8655738539200092e-05, 'epoch': 0.44}
2025-01-27 05:02:08 - ERROR - stderr -  44%|████████████████████████████████████                                              | 206/468 [5:05:10<6:24:18, 88.01s/it]
2025-01-27 05:03:34 - ERROR - stderr -  44%|████████████████████████████████████▎                                             | 207/468 [5:06:36<6:21:03, 87.60s/it]
2025-01-27 05:03:34 - ERROR - stderr - 
2025-01-27 05:03:34 - ERROR - stderr - 
2025-01-27 05:03:34 - INFO - stdout - {'loss': 0.4856, 'learning_rate': 1.8554761927389283e-05, 'epoch': 0.44}
2025-01-27 05:03:34 - ERROR - stderr -  44%|████████████████████████████████████▎                                             | 207/468 [5:06:36<6:21:03, 87.60s/it]
2025-01-27 05:05:02 - ERROR - stderr -  44%|████████████████████████████████████▍                                             | 208/468 [5:08:04<6:20:07, 87.72s/it]
2025-01-27 05:05:02 - ERROR - stderr - 
2025-01-27 05:05:02 - ERROR - stderr - 
2025-01-27 05:05:02 - INFO - stdout - {'loss': 0.5041, 'learning_rate': 1.8453614348645504e-05, 'epoch': 0.44}
2025-01-27 05:05:02 - ERROR - stderr -  44%|████████████████████████████████████▍                                             | 208/468 [5:08:04<6:20:07, 87.72s/it]
2025-01-27 05:06:30 - ERROR - stderr -  45%|████████████████████████████████████▌                                             | 209/468 [5:09:32<6:19:27, 87.91s/it]
2025-01-27 05:06:31 - ERROR - stderr - 
2025-01-27 05:06:31 - ERROR - stderr - 
2025-01-27 05:06:31 - INFO - stdout - {'loss': 0.4845, 'learning_rate': 1.8352300667680277e-05, 'epoch': 0.45}
2025-01-27 05:06:31 - ERROR - stderr -  45%|████████████████████████████████████▌                                             | 209/468 [5:09:33<6:19:27, 87.91s/it]
2025-01-27 05:07:58 - ERROR - stderr -  45%|████████████████████████████████████▊                                             | 210/468 [5:11:00<6:17:47, 87.86s/it]
2025-01-27 05:07:58 - ERROR - stderr - 
2025-01-27 05:07:58 - ERROR - stderr - 
2025-01-27 05:07:58 - INFO - stdout - {'loss': 0.4947, 'learning_rate': 1.8250825757193848e-05, 'epoch': 0.45}
2025-01-27 05:07:58 - ERROR - stderr -  45%|████████████████████████████████████▊                                             | 210/468 [5:11:00<6:17:47, 87.86s/it]
2025-01-27 05:09:27 - ERROR - stderr -  45%|████████████████████████████████████▉                                             | 211/468 [5:12:29<6:16:58, 88.01s/it]
2025-01-27 05:09:27 - ERROR - stderr - 
2025-01-27 05:09:27 - ERROR - stderr - 
2025-01-27 05:09:27 - INFO - stdout - {'loss': 0.5005, 'learning_rate': 1.8149194497640817e-05, 'epoch': 0.45}
2025-01-27 05:09:27 - ERROR - stderr -  45%|████████████████████████████████████▉                                             | 211/468 [5:12:29<6:16:58, 88.01s/it]
2025-01-27 05:10:54 - ERROR - stderr -  45%|█████████████████████████████████████▏                                            | 212/468 [5:13:56<6:14:20, 87.74s/it]
2025-01-27 05:10:54 - ERROR - stderr - 
2025-01-27 05:10:54 - ERROR - stderr - 
2025-01-27 05:10:54 - INFO - stdout - {'loss': 0.4906, 'learning_rate': 1.8047411776995424e-05, 'epoch': 0.45}
2025-01-27 05:10:54 - ERROR - stderr -  45%|█████████████████████████████████████▏                                            | 212/468 [5:13:56<6:14:20, 87.74s/it]
2025-01-27 05:12:21 - ERROR - stderr -  46%|█████████████████████████████████████▎                                            | 213/468 [5:15:23<6:12:53, 87.74s/it]
2025-01-27 05:12:21 - ERROR - stderr - 
2025-01-27 05:12:21 - ERROR - stderr - 
2025-01-27 05:12:21 - INFO - stdout - {'loss': 0.5018, 'learning_rate': 1.7945482490516465e-05, 'epoch': 0.45}
2025-01-27 05:12:21 - ERROR - stderr -  46%|█████████████████████████████████████▎                                            | 213/468 [5:15:23<6:12:53, 87.74s/it]
2025-01-27 05:13:49 - ERROR - stderr -  46%|█████████████████████████████████████▍                                            | 214/468 [5:16:51<6:11:47, 87.82s/it]
2025-01-27 05:13:49 - ERROR - stderr - 
2025-01-27 05:13:49 - ERROR - stderr - 
2025-01-27 05:13:49 - INFO - stdout - {'loss': 0.4992, 'learning_rate': 1.784341154051184e-05, 'epoch': 0.46}
2025-01-27 05:13:49 - ERROR - stderr -  46%|█████████████████████████████████████▍                                            | 214/468 [5:16:51<6:11:47, 87.82s/it]
2025-01-27 05:15:17 - ERROR - stderr -  46%|█████████████████████████████████████▋                                            | 215/468 [5:18:19<6:10:06, 87.77s/it]
2025-01-27 05:15:17 - ERROR - stderr - 
2025-01-27 05:15:17 - ERROR - stderr - 
2025-01-27 05:15:17 - INFO - stdout - {'loss': 0.4932, 'learning_rate': 1.7741203836102794e-05, 'epoch': 0.46}
2025-01-27 05:15:17 - ERROR - stderr -  46%|█████████████████████████████████████▋                                            | 215/468 [5:18:19<6:10:06, 87.77s/it]
2025-01-27 05:16:46 - ERROR - stderr -  46%|█████████████████████████████████████▊                                            | 216/468 [5:19:48<6:09:27, 87.97s/it]
2025-01-27 05:16:46 - ERROR - stderr - 
2025-01-27 05:16:46 - ERROR - stderr - 
2025-01-27 05:16:46 - INFO - stdout - {'loss': 0.4859, 'learning_rate': 1.763886429298778e-05, 'epoch': 0.46}
2025-01-27 05:16:46 - ERROR - stderr -  46%|█████████████████████████████████████▊                                            | 216/468 [5:19:48<6:09:27, 87.97s/it]
2025-01-27 05:18:13 - ERROR - stderr -  46%|██████████████████████████████████████                                            | 217/468 [5:21:15<6:07:32, 87.86s/it]
2025-01-27 05:18:13 - ERROR - stderr - 
2025-01-27 05:18:13 - ERROR - stderr - 
2025-01-27 05:18:13 - INFO - stdout - {'loss': 0.4872, 'learning_rate': 1.7536397833206082e-05, 'epoch': 0.46}
2025-01-27 05:18:13 - ERROR - stderr -  46%|██████████████████████████████████████                                            | 217/468 [5:21:15<6:07:32, 87.86s/it]
2025-01-27 05:19:42 - ERROR - stderr -  47%|██████████████████████████████████████▏                                           | 218/468 [5:22:44<6:06:50, 88.04s/it]
2025-01-27 05:19:42 - ERROR - stderr - 
2025-01-27 05:19:42 - ERROR - stderr - 
2025-01-27 05:19:42 - INFO - stdout - {'loss': 0.4988, 'learning_rate': 1.743380938490104e-05, 'epoch': 0.47}
2025-01-27 05:19:42 - ERROR - stderr -  47%|██████████████████████████████████████▏                                           | 218/468 [5:22:44<6:06:50, 88.04s/it]
2025-01-27 05:21:10 - ERROR - stderr -  47%|██████████████████████████████████████▎                                           | 219/468 [5:24:12<6:05:19, 88.03s/it]
2025-01-27 05:21:10 - ERROR - stderr - 
2025-01-27 05:21:10 - ERROR - stderr - 
2025-01-27 05:21:10 - INFO - stdout - {'loss': 0.4847, 'learning_rate': 1.7331103882083075e-05, 'epoch': 0.47}
2025-01-27 05:21:10 - ERROR - stderr -  47%|██████████████████████████████████████▎                                           | 219/468 [5:24:12<6:05:19, 88.03s/it]
2025-01-27 05:22:37 - ERROR - stderr -  47%|██████████████████████████████████████▌                                           | 220/468 [5:25:39<6:03:07, 87.85s/it]
2025-01-27 05:22:37 - ERROR - stderr - 
2025-01-27 05:22:37 - ERROR - stderr - 
2025-01-27 05:22:37 - INFO - stdout - {'loss': 0.4849, 'learning_rate': 1.7228286264392357e-05, 'epoch': 0.47}
2025-01-27 05:22:37 - ERROR - stderr -  47%|██████████████████████████████████████▌                                           | 220/468 [5:25:39<6:03:07, 87.85s/it]
2025-01-27 05:24:05 - ERROR - stderr -  47%|██████████████████████████████████████▋                                           | 221/468 [5:27:07<6:01:58, 87.93s/it]
2025-01-27 05:24:05 - ERROR - stderr - 
2025-01-27 05:24:05 - ERROR - stderr - 
2025-01-27 05:24:05 - INFO - stdout - {'loss': 0.4913, 'learning_rate': 1.7125361476861237e-05, 'epoch': 0.47}
2025-01-27 05:24:05 - ERROR - stderr -  47%|██████████████████████████████████████▋                                           | 221/468 [5:27:07<6:01:58, 87.93s/it]
2025-01-27 05:25:33 - ERROR - stderr -  47%|██████████████████████████████████████▉                                           | 222/468 [5:28:35<6:00:11, 87.85s/it]
2025-01-27 05:25:33 - ERROR - stderr - 
2025-01-27 05:25:33 - ERROR - stderr - 
2025-01-27 05:25:33 - INFO - stdout - {'loss': 0.4875, 'learning_rate': 1.7022334469676433e-05, 'epoch': 0.47}
2025-01-27 05:25:33 - ERROR - stderr -  47%|██████████████████████████████████████▉                                           | 222/468 [5:28:35<6:00:11, 87.85s/it]
2025-01-27 05:27:00 - ERROR - stderr -  48%|███████████████████████████████████████                                           | 223/468 [5:30:02<5:57:34, 87.57s/it]
2025-01-27 05:27:00 - ERROR - stderr - 
2025-01-27 05:27:00 - ERROR - stderr - 
2025-01-27 05:27:00 - INFO - stdout - {'loss': 0.4859, 'learning_rate': 1.691921019794093e-05, 'epoch': 0.48}
2025-01-27 05:27:00 - ERROR - stderr -  48%|███████████████████████████████████████                                           | 223/468 [5:30:02<5:57:34, 87.57s/it]
2025-01-27 05:28:27 - ERROR - stderr -  48%|███████████████████████████████████████▏                                          | 224/468 [5:31:29<5:56:14, 87.60s/it]
2025-01-27 05:28:27 - ERROR - stderr - 
2025-01-27 05:28:27 - ERROR - stderr - 
2025-01-27 05:28:27 - INFO - stdout - {'loss': 0.4922, 'learning_rate': 1.6815993621435656e-05, 'epoch': 0.48}
2025-01-27 05:28:27 - ERROR - stderr -  48%|███████████████████████████████████████▏                                          | 224/468 [5:31:29<5:56:14, 87.60s/it]
2025-01-27 05:29:56 - ERROR - stderr -  48%|███████████████████████████████████████▍                                          | 225/468 [5:32:58<5:55:24, 87.75s/it]
2025-01-27 05:29:56 - ERROR - stderr - 
2025-01-27 05:29:56 - ERROR - stderr - 
2025-01-27 05:29:56 - INFO - stdout - {'loss': 0.4849, 'learning_rate': 1.6712689704380978e-05, 'epoch': 0.48}
2025-01-27 05:29:56 - ERROR - stderr -  48%|███████████████████████████████████████▍                                          | 225/468 [5:32:58<5:55:24, 87.75s/it]
2025-01-27 05:31:23 - ERROR - stderr -  48%|███████████████████████████████████████▌                                          | 226/468 [5:34:25<5:53:20, 87.60s/it]
2025-01-27 05:31:23 - ERROR - stderr - 
2025-01-27 05:31:23 - ERROR - stderr - 
2025-01-27 05:31:23 - INFO - stdout - {'loss': 0.4892, 'learning_rate': 1.6609303415197904e-05, 'epoch': 0.48}
2025-01-27 05:31:23 - ERROR - stderr -  48%|███████████████████████████████████████▌                                          | 226/468 [5:34:25<5:53:20, 87.60s/it]
2025-01-27 05:32:50 - ERROR - stderr -  49%|███████████████████████████████████████▊                                          | 227/468 [5:35:52<5:51:23, 87.48s/it]
2025-01-27 05:32:50 - ERROR - stderr - 
2025-01-27 05:32:50 - ERROR - stderr - 
2025-01-27 05:32:50 - INFO - stdout - {'loss': 0.4749, 'learning_rate': 1.6505839726269153e-05, 'epoch': 0.48}
2025-01-27 05:32:50 - ERROR - stderr -  49%|███████████████████████████████████████▊                                          | 227/468 [5:35:52<5:51:23, 87.48s/it]
2025-01-27 05:34:18 - ERROR - stderr -  49%|███████████████████████████████████████▉                                          | 228/468 [5:37:20<5:50:20, 87.58s/it]
2025-01-27 05:34:18 - ERROR - stderr - 
2025-01-27 05:34:18 - ERROR - stderr - 
2025-01-27 05:34:18 - INFO - stdout - {'loss': 0.4793, 'learning_rate': 1.64023036137e-05, 'epoch': 0.49}
2025-01-27 05:34:18 - ERROR - stderr -  49%|███████████████████████████████████████▉                                          | 228/468 [5:37:20<5:50:20, 87.58s/it]
2025-01-27 05:35:47 - ERROR - stderr -  49%|████████████████████████████████████████                                          | 229/468 [5:38:49<5:50:14, 87.93s/it]
2025-01-27 05:35:47 - ERROR - stderr - 
2025-01-27 05:35:47 - ERROR - stderr - 
2025-01-27 05:35:47 - INFO - stdout - {'loss': 0.4966, 'learning_rate': 1.6298700057078945e-05, 'epoch': 0.49}
2025-01-27 05:35:47 - ERROR - stderr -  49%|████████████████████████████████████████                                          | 229/468 [5:38:49<5:50:14, 87.93s/it]
2025-01-27 05:37:15 - ERROR - stderr -  49%|████████████████████████████████████████▎                                         | 230/468 [5:40:17<5:48:59, 87.98s/it]
2025-01-27 05:37:15 - ERROR - stderr - 
2025-01-27 05:37:15 - ERROR - stderr - 
2025-01-27 05:37:15 - INFO - stdout - {'loss': 0.5009, 'learning_rate': 1.619503403923823e-05, 'epoch': 0.49}
2025-01-27 05:37:15 - ERROR - stderr -  49%|████████████████████████████████████████▎                                         | 230/468 [5:40:17<5:48:59, 87.98s/it]
2025-01-27 05:38:42 - ERROR - stderr -  49%|████████████████████████████████████████▍                                         | 231/468 [5:41:44<5:47:00, 87.85s/it]
2025-01-27 05:38:42 - ERROR - stderr - 
2025-01-27 05:38:42 - ERROR - stderr - 
2025-01-27 05:38:42 - INFO - stdout - {'loss': 0.4898, 'learning_rate': 1.6091310546014162e-05, 'epoch': 0.49}
2025-01-27 05:38:42 - ERROR - stderr -  49%|████████████████████████████████████████▍                                         | 231/468 [5:41:44<5:47:00, 87.85s/it]
2025-01-27 05:40:10 - ERROR - stderr -  50%|████████████████████████████████████████▋                                         | 232/468 [5:43:12<5:45:00, 87.72s/it]
2025-01-27 05:40:10 - ERROR - stderr - 
2025-01-27 05:40:10 - ERROR - stderr - 
2025-01-27 05:40:10 - INFO - stdout - {'loss': 0.4885, 'learning_rate': 1.598753456600735e-05, 'epoch': 0.49}
2025-01-27 05:40:10 - ERROR - stderr -  50%|████████████████████████████████████████▋                                         | 232/468 [5:43:12<5:45:00, 87.72s/it]
2025-01-27 05:41:38 - ERROR - stderr -  50%|████████████████████████████████████████▊                                         | 233/468 [5:44:40<5:43:57, 87.82s/it]
2025-01-27 05:41:38 - ERROR - stderr - 
2025-01-27 05:41:38 - ERROR - stderr - 
2025-01-27 05:41:38 - INFO - stdout - {'loss': 0.4881, 'learning_rate': 1.5883711090342767e-05, 'epoch': 0.5}
2025-01-27 05:41:38 - ERROR - stderr -  50%|████████████████████████████████████████▊                                         | 233/468 [5:44:40<5:43:57, 87.82s/it]
2025-01-27 05:43:05 - ERROR - stderr -  50%|█████████████████████████████████████████                                         | 234/468 [5:46:07<5:42:09, 87.73s/it]
2025-01-27 05:43:05 - ERROR - stderr - 
2025-01-27 05:43:05 - ERROR - stderr - 
2025-01-27 05:43:05 - INFO - stdout - {'loss': 0.5056, 'learning_rate': 1.5779845112429706e-05, 'epoch': 0.5}
2025-01-27 05:43:05 - ERROR - stderr -  50%|█████████████████████████████████████████                                         | 234/468 [5:46:07<5:42:09, 87.73s/it]
2025-01-27 05:44:32 - ERROR - stderr -  50%|█████████████████████████████████████████▏                                        | 235/468 [5:47:34<5:40:06, 87.58s/it]
2025-01-27 05:44:33 - ERROR - stderr - 
2025-01-27 05:44:33 - ERROR - stderr - 
2025-01-27 05:44:33 - INFO - stdout - {'loss': 0.4927, 'learning_rate': 1.567594162772159e-05, 'epoch': 0.5}
2025-01-27 05:44:33 - ERROR - stderr -  50%|█████████████████████████████████████████▏                                        | 235/468 [5:47:35<5:40:06, 87.58s/it]
2025-01-27 05:46:00 - ERROR - stderr -  50%|█████████████████████████████████████████▎                                        | 236/468 [5:49:02<5:38:54, 87.65s/it]
2025-01-27 05:46:00 - ERROR - stderr - 
2025-01-27 05:46:00 - ERROR - stderr - 
2025-01-27 05:46:00 - INFO - stdout - {'loss': 0.4958, 'learning_rate': 1.5572005633475758e-05, 'epoch': 0.5}
2025-01-27 05:46:00 - ERROR - stderr -  50%|█████████████████████████████████████████▎                                        | 236/468 [5:49:02<5:38:54, 87.65s/it]
2025-01-27 05:47:28 - ERROR - stderr -  51%|█████████████████████████████████████████▌                                        | 237/468 [5:50:30<5:37:31, 87.67s/it]
2025-01-27 05:47:28 - ERROR - stderr - 
2025-01-27 05:47:28 - ERROR - stderr - 
2025-01-27 05:47:28 - INFO - stdout - {'loss': 0.5028, 'learning_rate': 1.5468042128513085e-05, 'epoch': 0.51}
2025-01-27 05:47:28 - ERROR - stderr -  51%|█████████████████████████████████████████▌                                        | 237/468 [5:50:30<5:37:31, 87.67s/it]
2025-01-27 05:48:56 - ERROR - stderr -  51%|█████████████████████████████████████████▋                                        | 238/468 [5:51:58<5:36:40, 87.83s/it]
2025-01-27 05:48:56 - ERROR - stderr - 
2025-01-27 05:48:56 - ERROR - stderr - 
2025-01-27 05:48:56 - INFO - stdout - {'loss': 0.5009, 'learning_rate': 1.53640561129776e-05, 'epoch': 0.51}
2025-01-27 05:48:56 - ERROR - stderr -  51%|█████████████████████████████████████████▋                                        | 238/468 [5:51:58<5:36:40, 87.83s/it]
2025-01-27 05:50:24 - ERROR - stderr -  51%|█████████████████████████████████████████▉                                        | 239/468 [5:53:26<5:35:11, 87.83s/it]
2025-01-27 05:50:24 - ERROR - stderr - 
2025-01-27 05:50:24 - ERROR - stderr - 
2025-01-27 05:50:24 - INFO - stdout - {'loss': 0.4787, 'learning_rate': 1.5260052588095966e-05, 'epoch': 0.51}
2025-01-27 05:50:24 - ERROR - stderr -  51%|█████████████████████████████████████████▉                                        | 239/468 [5:53:26<5:35:11, 87.83s/it]
2025-01-27 05:51:51 - ERROR - stderr -  51%|██████████████████████████████████████████                                        | 240/468 [5:54:53<5:33:22, 87.73s/it]
2025-01-27 05:51:51 - ERROR - stderr - 
2025-01-27 05:51:51 - ERROR - stderr - 
2025-01-27 05:51:51 - INFO - stdout - {'loss': 0.4865, 'learning_rate': 1.5156036555936963e-05, 'epoch': 0.51}
2025-01-27 05:51:51 - ERROR - stderr -  51%|██████████████████████████████████████████                                        | 240/468 [5:54:53<5:33:22, 87.73s/it]
2025-01-27 05:53:19 - ERROR - stderr -  51%|██████████████████████████████████████████▏                                       | 241/468 [5:56:21<5:31:54, 87.73s/it]
2025-01-27 05:53:19 - ERROR - stderr - 
2025-01-27 05:53:19 - ERROR - stderr - 
2025-01-27 05:53:19 - INFO - stdout - {'loss': 0.4838, 'learning_rate': 1.5052013019170917e-05, 'epoch': 0.51}
2025-01-27 05:53:19 - ERROR - stderr -  51%|██████████████████████████████████████████▏                                       | 241/468 [5:56:21<5:31:54, 87.73s/it]
2025-01-27 05:54:47 - ERROR - stderr -  52%|██████████████████████████████████████████▍                                       | 242/468 [5:57:49<5:30:54, 87.85s/it]
2025-01-27 05:54:47 - ERROR - stderr - 
2025-01-27 05:54:47 - ERROR - stderr - 
2025-01-27 05:54:47 - INFO - stdout - {'loss': 0.4934, 'learning_rate': 1.4947986980829084e-05, 'epoch': 0.52}
2025-01-27 05:54:47 - ERROR - stderr -  52%|██████████████████████████████████████████▍                                       | 242/468 [5:57:49<5:30:54, 87.85s/it]
2025-01-27 05:56:15 - ERROR - stderr -  52%|██████████████████████████████████████████▌                                       | 243/468 [5:59:17<5:29:43, 87.93s/it]
2025-01-27 05:56:15 - ERROR - stderr - 
2025-01-27 05:56:15 - ERROR - stderr - 
2025-01-27 05:56:15 - INFO - stdout - {'loss': 0.4955, 'learning_rate': 1.484396344406304e-05, 'epoch': 0.52}
2025-01-27 05:56:15 - ERROR - stderr -  52%|██████████████████████████████████████████▌                                       | 243/468 [5:59:17<5:29:43, 87.93s/it]
2025-01-27 05:57:44 - ERROR - stderr -  52%|██████████████████████████████████████████▊                                       | 244/468 [6:00:46<5:28:34, 88.01s/it]
2025-01-27 05:57:44 - ERROR - stderr - 
2025-01-27 05:57:44 - ERROR - stderr - 
2025-01-27 05:57:44 - INFO - stdout - {'loss': 0.4953, 'learning_rate': 1.4739947411904036e-05, 'epoch': 0.52}
2025-01-27 05:57:44 - ERROR - stderr -  52%|██████████████████████████████████████████▊                                       | 244/468 [6:00:46<5:28:34, 88.01s/it]
2025-01-27 05:59:12 - ERROR - stderr -  52%|██████████████████████████████████████████▉                                       | 245/468 [6:02:14<5:27:12, 88.04s/it]
2025-01-27 05:59:12 - ERROR - stderr - 
2025-01-27 05:59:12 - ERROR - stderr - 
2025-01-27 05:59:12 - INFO - stdout - {'loss': 0.4914, 'learning_rate': 1.4635943887022402e-05, 'epoch': 0.52}
2025-01-27 05:59:12 - ERROR - stderr -  52%|██████████████████████████████████████████▉                                       | 245/468 [6:02:14<5:27:12, 88.04s/it]
2025-01-27 06:00:40 - ERROR - stderr -  53%|███████████████████████████████████████████                                       | 246/468 [6:03:42<5:25:59, 88.10s/it]
2025-01-27 06:00:40 - ERROR - stderr - 
2025-01-27 06:00:40 - ERROR - stderr - 
2025-01-27 06:00:40 - INFO - stdout - {'loss': 0.4856, 'learning_rate': 1.453195787148691e-05, 'epoch': 0.52}
2025-01-27 06:00:40 - ERROR - stderr -  53%|███████████████████████████████████████████                                       | 246/468 [6:03:42<5:25:59, 88.10s/it]
2025-01-27 06:02:07 - ERROR - stderr -  53%|███████████████████████████████████████████▎                                      | 247/468 [6:05:09<5:23:23, 87.80s/it]
2025-01-27 06:02:07 - ERROR - stderr - 
2025-01-27 06:02:07 - ERROR - stderr - 
2025-01-27 06:02:07 - INFO - stdout - {'loss': 0.498, 'learning_rate': 1.4427994366524248e-05, 'epoch': 0.53}
2025-01-27 06:02:07 - ERROR - stderr -  53%|███████████████████████████████████████████▎                                      | 247/468 [6:05:09<5:23:23, 87.80s/it]
2025-01-27 06:03:34 - ERROR - stderr -  53%|███████████████████████████████████████████▍                                      | 248/468 [6:06:36<5:21:17, 87.63s/it]
2025-01-27 06:03:34 - ERROR - stderr - 
2025-01-27 06:03:34 - ERROR - stderr - 
2025-01-27 06:03:34 - INFO - stdout - {'loss': 0.4895, 'learning_rate': 1.4324058372278415e-05, 'epoch': 0.53}
2025-01-27 06:03:34 - ERROR - stderr -  53%|███████████████████████████████████████████▍                                      | 248/468 [6:06:36<5:21:17, 87.63s/it]
2025-01-27 06:05:02 - ERROR - stderr -  53%|███████████████████████████████████████████▋                                      | 249/468 [6:08:04<5:20:17, 87.75s/it]
2025-01-27 06:05:02 - ERROR - stderr - 
2025-01-27 06:05:02 - ERROR - stderr - 
2025-01-27 06:05:02 - INFO - stdout - {'loss': 0.4873, 'learning_rate': 1.4220154887570298e-05, 'epoch': 0.53}
2025-01-27 06:05:02 - ERROR - stderr -  53%|███████████████████████████████████████████▋                                      | 249/468 [6:08:04<5:20:17, 87.75s/it]
2025-01-27 06:06:30 - ERROR - stderr -  53%|███████████████████████████████████████████▊                                      | 250/468 [6:09:32<5:18:45, 87.73s/it]
2025-01-27 06:06:30 - ERROR - stderr - 
2025-01-27 06:06:30 - ERROR - stderr - 
2025-01-27 06:06:30 - INFO - stdout - {'loss': 0.4877, 'learning_rate': 1.4116288909657232e-05, 'epoch': 0.53}
2025-01-27 06:06:30 - ERROR - stderr -  53%|███████████████████████████████████████████▊                                      | 250/468 [6:09:32<5:18:45, 87.73s/it]
2025-01-27 06:07:58 - ERROR - stderr -  54%|███████████████████████████████████████████▉                                      | 251/468 [6:11:00<5:17:56, 87.91s/it]
2025-01-27 06:07:58 - ERROR - stderr - 
2025-01-27 06:07:58 - ERROR - stderr - 
2025-01-27 06:07:58 - INFO - stdout - {'loss': 0.4962, 'learning_rate': 1.4012465433992651e-05, 'epoch': 0.54}
2025-01-27 06:07:58 - ERROR - stderr -  54%|███████████████████████████████████████████▉                                      | 251/468 [6:11:00<5:17:56, 87.91s/it]
2025-01-27 06:09:26 - ERROR - stderr -  54%|████████████████████████████████████████████▏                                     | 252/468 [6:12:28<5:15:47, 87.72s/it]
2025-01-27 06:09:26 - ERROR - stderr - 
2025-01-27 06:09:26 - ERROR - stderr - 
2025-01-27 06:09:26 - INFO - stdout - {'loss': 0.5092, 'learning_rate': 1.3908689453985844e-05, 'epoch': 0.54}
2025-01-27 06:09:26 - ERROR - stderr -  54%|████████████████████████████████████████████▏                                     | 252/468 [6:12:28<5:15:47, 87.72s/it]
2025-01-27 06:10:54 - ERROR - stderr -  54%|████████████████████████████████████████████▎                                     | 253/468 [6:13:56<5:14:57, 87.90s/it]
2025-01-27 06:10:54 - ERROR - stderr - 
2025-01-27 06:10:54 - ERROR - stderr - 
2025-01-27 06:10:54 - INFO - stdout - {'loss': 0.4829, 'learning_rate': 1.3804965960761774e-05, 'epoch': 0.54}
2025-01-27 06:10:54 - ERROR - stderr -  54%|████████████████████████████████████████████▎                                     | 253/468 [6:13:56<5:14:57, 87.90s/it]
2025-01-27 06:12:21 - ERROR - stderr -  54%|████████████████████████████████████████████▌                                     | 254/468 [6:15:23<5:12:41, 87.67s/it]
2025-01-27 06:12:21 - ERROR - stderr - 
2025-01-27 06:12:21 - ERROR - stderr - 
2025-01-27 06:12:21 - INFO - stdout - {'loss': 0.4868, 'learning_rate': 1.3701299942921052e-05, 'epoch': 0.54}
2025-01-27 06:12:21 - ERROR - stderr -  54%|████████████████████████████████████████████▌                                     | 254/468 [6:15:23<5:12:41, 87.67s/it]
2025-01-27 06:13:49 - ERROR - stderr -  54%|████████████████████████████████████████████▋                                     | 255/468 [6:16:51<5:11:32, 87.76s/it]
2025-01-27 06:13:49 - ERROR - stderr - 
2025-01-27 06:13:49 - ERROR - stderr - 
2025-01-27 06:13:49 - INFO - stdout - {'loss': 0.4912, 'learning_rate': 1.3597696386299997e-05, 'epoch': 0.54}
2025-01-27 06:13:49 - ERROR - stderr -  54%|████████████████████████████████████████████▋                                     | 255/468 [6:16:51<5:11:32, 87.76s/it]
2025-01-27 06:15:17 - ERROR - stderr -  55%|████████████████████████████████████████████▊                                     | 256/468 [6:18:19<5:10:27, 87.87s/it]
2025-01-27 06:15:17 - ERROR - stderr - 
2025-01-27 06:15:17 - ERROR - stderr - 
2025-01-27 06:15:17 - INFO - stdout - {'loss': 0.4825, 'learning_rate': 1.3494160273730844e-05, 'epoch': 0.55}
2025-01-27 06:15:17 - ERROR - stderr -  55%|████████████████████████████████████████████▊                                     | 256/468 [6:18:19<5:10:27, 87.87s/it]
2025-01-27 06:16:46 - ERROR - stderr -  55%|█████████████████████████████████████████████                                     | 257/468 [6:19:48<5:09:35, 88.04s/it]
2025-01-27 06:16:46 - ERROR - stderr - 
2025-01-27 06:16:46 - ERROR - stderr - 
2025-01-27 06:16:46 - INFO - stdout - {'loss': 0.5019, 'learning_rate': 1.33906965848021e-05, 'epoch': 0.55}
2025-01-27 06:16:46 - ERROR - stderr -  55%|█████████████████████████████████████████████                                     | 257/468 [6:19:48<5:09:35, 88.04s/it]
2025-01-27 06:18:13 - ERROR - stderr -  55%|█████████████████████████████████████████████▏                                    | 258/468 [6:21:15<5:07:28, 87.85s/it]
2025-01-27 06:18:13 - ERROR - stderr - 
2025-01-27 06:18:13 - ERROR - stderr - 
2025-01-27 06:18:13 - INFO - stdout - {'loss': 0.4891, 'learning_rate': 1.3287310295619027e-05, 'epoch': 0.55}
2025-01-27 06:18:13 - ERROR - stderr -  55%|█████████████████████████████████████████████▏                                    | 258/468 [6:21:15<5:07:28, 87.85s/it]
2025-01-27 06:19:41 - ERROR - stderr -  55%|█████████████████████████████████████████████▍                                    | 259/468 [6:22:43<5:05:49, 87.79s/it]
2025-01-27 06:19:41 - ERROR - stderr - 
2025-01-27 06:19:41 - ERROR - stderr - 
2025-01-27 06:19:41 - INFO - stdout - {'loss': 0.4858, 'learning_rate': 1.3184006378564348e-05, 'epoch': 0.55}
2025-01-27 06:19:41 - ERROR - stderr -  55%|█████████████████████████████████████████████▍                                    | 259/468 [6:22:43<5:05:49, 87.79s/it]
2025-01-27 06:21:09 - ERROR - stderr -  56%|█████████████████████████████████████████████▌                                    | 260/468 [6:24:11<5:04:36, 87.87s/it]
2025-01-27 06:21:09 - ERROR - stderr - 
2025-01-27 06:21:09 - ERROR - stderr - 
2025-01-27 06:21:09 - INFO - stdout - {'loss': 0.4808, 'learning_rate': 1.3080789802059076e-05, 'epoch': 0.55}
2025-01-27 06:21:09 - ERROR - stderr -  56%|█████████████████████████████████████████████▌                                    | 260/468 [6:24:11<5:04:36, 87.87s/it]
2025-01-27 06:22:37 - ERROR - stderr -  56%|█████████████████████████████████████████████▋                                    | 261/468 [6:25:39<5:03:51, 88.07s/it]
2025-01-27 06:22:37 - ERROR - stderr - 
2025-01-27 06:22:37 - ERROR - stderr - 
2025-01-27 06:22:37 - INFO - stdout - {'loss': 0.4965, 'learning_rate': 1.2977665530323568e-05, 'epoch': 0.56}
2025-01-27 06:22:37 - ERROR - stderr -  56%|█████████████████████████████████████████████▋                                    | 261/468 [6:25:39<5:03:51, 88.07s/it]
2025-01-27 06:24:05 - ERROR - stderr -  56%|█████████████████████████████████████████████▉                                    | 262/468 [6:27:07<5:02:04, 87.98s/it]
2025-01-27 06:24:05 - ERROR - stderr - 
2025-01-27 06:24:05 - ERROR - stderr - 
2025-01-27 06:24:05 - INFO - stdout - {'loss': 0.4884, 'learning_rate': 1.2874638523138764e-05, 'epoch': 0.56}
2025-01-27 06:24:05 - ERROR - stderr -  56%|█████████████████████████████████████████████▉                                    | 262/468 [6:27:07<5:02:04, 87.98s/it]
2025-01-27 06:25:34 - ERROR - stderr -  56%|██████████████████████████████████████████████                                    | 263/468 [6:28:36<5:01:08, 88.14s/it]
2025-01-27 06:25:34 - ERROR - stderr - 
2025-01-27 06:25:34 - ERROR - stderr - 
2025-01-27 06:25:34 - INFO - stdout - {'loss': 0.4935, 'learning_rate': 1.2771713735607647e-05, 'epoch': 0.56}
2025-01-27 06:25:34 - ERROR - stderr -  56%|██████████████████████████████████████████████                                    | 263/468 [6:28:36<5:01:08, 88.14s/it]
2025-01-27 06:27:01 - ERROR - stderr -  56%|██████████████████████████████████████████████▎                                   | 264/468 [6:30:03<4:58:58, 87.93s/it]
2025-01-27 06:27:01 - ERROR - stderr - 
2025-01-27 06:27:01 - ERROR - stderr - 
2025-01-27 06:27:01 - INFO - stdout - {'loss': 0.4744, 'learning_rate': 1.2668896117916928e-05, 'epoch': 0.56}
2025-01-27 06:27:01 - ERROR - stderr -  56%|██████████████████████████████████████████████▎                                   | 264/468 [6:30:03<4:58:58, 87.93s/it]
2025-01-27 06:28:28 - ERROR - stderr -  57%|██████████████████████████████████████████████▍                                   | 265/468 [6:31:30<4:56:34, 87.66s/it]
2025-01-27 06:28:28 - ERROR - stderr - 
2025-01-27 06:28:28 - ERROR - stderr - 
2025-01-27 06:28:28 - INFO - stdout - {'loss': 0.5041, 'learning_rate': 1.256619061509896e-05, 'epoch': 0.57}
2025-01-27 06:28:28 - ERROR - stderr -  57%|██████████████████████████████████████████████▍                                   | 265/468 [6:31:30<4:56:34, 87.66s/it]
2025-01-27 06:29:55 - ERROR - stderr -  57%|██████████████████████████████████████████████▌                                   | 266/468 [6:32:57<4:54:38, 87.52s/it]
2025-01-27 06:29:55 - ERROR - stderr - 
2025-01-27 06:29:55 - ERROR - stderr - 
2025-01-27 06:29:55 - INFO - stdout - {'loss': 0.4933, 'learning_rate': 1.2463602166793919e-05, 'epoch': 0.57}
2025-01-27 06:29:55 - ERROR - stderr -  57%|██████████████████████████████████████████████▌                                   | 266/468 [6:32:57<4:54:38, 87.52s/it]
2025-01-27 06:31:22 - ERROR - stderr -  57%|██████████████████████████████████████████████▊                                   | 267/468 [6:34:24<4:52:56, 87.45s/it]
2025-01-27 06:31:23 - ERROR - stderr - 
2025-01-27 06:31:23 - ERROR - stderr - 
2025-01-27 06:31:23 - INFO - stdout - {'loss': 0.4844, 'learning_rate': 1.2361135707012223e-05, 'epoch': 0.57}
2025-01-27 06:31:23 - ERROR - stderr -  57%|██████████████████████████████████████████████▊                                   | 267/468 [6:34:25<4:52:56, 87.45s/it]
2025-01-27 06:32:50 - ERROR - stderr -  57%|██████████████████████████████████████████████▉                                   | 268/468 [6:35:52<4:51:45, 87.53s/it]
2025-01-27 06:32:50 - ERROR - stderr - 
2025-01-27 06:32:50 - ERROR - stderr - 
2025-01-27 06:32:50 - INFO - stdout - {'loss': 0.4839, 'learning_rate': 1.2258796163897212e-05, 'epoch': 0.57}
2025-01-27 06:32:50 - ERROR - stderr -  57%|██████████████████████████████████████████████▉                                   | 268/468 [6:35:52<4:51:45, 87.53s/it]
2025-01-27 06:34:18 - ERROR - stderr -  57%|███████████████████████████████████████████████▏                                  | 269/468 [6:37:20<4:50:43, 87.65s/it]
2025-01-27 06:34:18 - ERROR - stderr - 
2025-01-27 06:34:18 - ERROR - stderr - 
2025-01-27 06:34:18 - INFO - stdout - {'loss': 0.5059, 'learning_rate': 1.215658845948816e-05, 'epoch': 0.57}
2025-01-27 06:34:18 - ERROR - stderr -  57%|███████████████████████████████████████████████▏                                  | 269/468 [6:37:20<4:50:43, 87.65s/it]
2025-01-27 06:35:46 - ERROR - stderr -  58%|███████████████████████████████████████████████▎                                  | 270/468 [6:38:48<4:49:23, 87.69s/it]
2025-01-27 06:35:46 - ERROR - stderr - 
2025-01-27 06:35:46 - ERROR - stderr - 
2025-01-27 06:35:46 - INFO - stdout - {'loss': 0.4756, 'learning_rate': 1.2054517509483537e-05, 'epoch': 0.58}
2025-01-27 06:35:46 - ERROR - stderr -  58%|███████████████████████████████████████████████▎                                  | 270/468 [6:38:48<4:49:23, 87.69s/it]
2025-01-27 06:37:13 - ERROR - stderr -  58%|███████████████████████████████████████████████▍                                  | 271/468 [6:40:15<4:47:28, 87.56s/it]
2025-01-27 06:37:13 - ERROR - stderr - 
2025-01-27 06:37:13 - ERROR - stderr - 
2025-01-27 06:37:13 - INFO - stdout - {'loss': 0.4766, 'learning_rate': 1.195258822300458e-05, 'epoch': 0.58}
2025-01-27 06:37:13 - ERROR - stderr -  58%|███████████████████████████████████████████████▍                                  | 271/468 [6:40:15<4:47:28, 87.56s/it]
2025-01-27 06:38:41 - ERROR - stderr -  58%|███████████████████████████████████████████████▋                                  | 272/468 [6:41:43<4:46:12, 87.61s/it]
2025-01-27 06:38:41 - ERROR - stderr - 
2025-01-27 06:38:41 - ERROR - stderr - 
2025-01-27 06:38:41 - INFO - stdout - {'loss': 0.5047, 'learning_rate': 1.1850805502359189e-05, 'epoch': 0.58}
2025-01-27 06:38:41 - ERROR - stderr -  58%|███████████████████████████████████████████████▋                                  | 272/468 [6:41:43<4:46:12, 87.61s/it]
2025-01-27 06:40:09 - ERROR - stderr -  58%|███████████████████████████████████████████████▊                                  | 273/468 [6:43:11<4:45:01, 87.70s/it]
2025-01-27 06:40:09 - ERROR - stderr - 
2025-01-27 06:40:09 - ERROR - stderr - 
2025-01-27 06:40:09 - INFO - stdout - {'loss': 0.4672, 'learning_rate': 1.1749174242806153e-05, 'epoch': 0.58}
2025-01-27 06:40:09 - ERROR - stderr -  58%|███████████████████████████████████████████████▊                                  | 273/468 [6:43:11<4:45:01, 87.70s/it]
2025-01-27 06:41:35 - ERROR - stderr -  59%|████████████████████████████████████████████████                                  | 274/468 [6:44:37<4:42:30, 87.37s/it]
2025-01-27 06:41:35 - ERROR - stderr - 
2025-01-27 06:41:35 - ERROR - stderr - 
2025-01-27 06:41:35 - INFO - stdout - {'loss': 0.4784, 'learning_rate': 1.164769933231972e-05, 'epoch': 0.58}
2025-01-27 06:41:35 - ERROR - stderr -  59%|████████████████████████████████████████████████                                  | 274/468 [6:44:37<4:42:30, 87.37s/it]
2025-01-27 06:43:04 - ERROR - stderr -  59%|████████████████████████████████████████████████▏                                 | 275/468 [6:46:06<4:42:00, 87.67s/it]
2025-01-27 06:43:04 - ERROR - stderr - 
2025-01-27 06:43:04 - ERROR - stderr - 
2025-01-27 06:43:04 - INFO - stdout - {'loss': 0.5188, 'learning_rate': 1.1546385651354495e-05, 'epoch': 0.59}
2025-01-27 06:43:04 - ERROR - stderr -  59%|████████████████████████████████████████████████▏                                 | 275/468 [6:46:06<4:42:00, 87.67s/it]
2025-01-27 06:44:32 - ERROR - stderr -  59%|████████████████████████████████████████████████▎                                 | 276/468 [6:47:34<4:41:21, 87.92s/it]
2025-01-27 06:44:32 - ERROR - stderr - 
2025-01-27 06:44:32 - ERROR - stderr - 
2025-01-27 06:44:32 - INFO - stdout - {'loss': 0.4915, 'learning_rate': 1.144523807261072e-05, 'epoch': 0.59}
2025-01-27 06:44:32 - ERROR - stderr -  59%|████████████████████████████████████████████████▎                                 | 276/468 [6:47:34<4:41:21, 87.92s/it]
2025-01-27 06:46:00 - ERROR - stderr -  59%|████████████████████████████████████████████████▌                                 | 277/468 [6:49:02<4:39:35, 87.83s/it]
2025-01-27 06:46:00 - ERROR - stderr - 
2025-01-27 06:46:00 - ERROR - stderr - 
2025-01-27 06:46:00 - INFO - stdout - {'loss': 0.4741, 'learning_rate': 1.1344261460799914e-05, 'epoch': 0.59}
2025-01-27 06:46:00 - ERROR - stderr -  59%|████████████████████████████████████████████████▌                                 | 277/468 [6:49:02<4:39:35, 87.83s/it]
2025-01-27 06:47:28 - ERROR - stderr -  59%|████████████████████████████████████████████████▋                                 | 278/468 [6:50:30<4:38:03, 87.81s/it]
2025-01-27 06:47:28 - ERROR - stderr - 
2025-01-27 06:47:28 - ERROR - stderr - 
2025-01-27 06:47:28 - INFO - stdout - {'loss': 0.5056, 'learning_rate': 1.1243460672410919e-05, 'epoch': 0.59}
2025-01-27 06:47:28 - ERROR - stderr -  59%|████████████████████████████████████████████████▋                                 | 278/468 [6:50:30<4:38:03, 87.81s/it]
2025-01-27 06:48:55 - ERROR - stderr -  60%|████████████████████████████████████████████████▉                                 | 279/468 [6:51:57<4:36:35, 87.81s/it]
2025-01-27 06:48:55 - ERROR - stderr - 
2025-01-27 06:48:55 - ERROR - stderr - 
2025-01-27 06:48:55 - INFO - stdout - {'loss': 0.4824, 'learning_rate': 1.1142840555476313e-05, 'epoch': 0.6}
2025-01-27 06:48:55 - ERROR - stderr -  60%|████████████████████████████████████████████████▉                                 | 279/468 [6:51:57<4:36:35, 87.81s/it]
2025-01-27 06:50:23 - ERROR - stderr -  60%|█████████████████████████████████████████████████                                 | 280/468 [6:53:25<4:34:23, 87.57s/it]
2025-01-27 06:50:23 - ERROR - stderr - 
2025-01-27 06:50:23 - ERROR - stderr - 
2025-01-27 06:50:23 - INFO - stdout - {'loss': 0.4941, 'learning_rate': 1.1042405949339247e-05, 'epoch': 0.6}
2025-01-27 06:50:23 - ERROR - stderr -  60%|█████████████████████████████████████████████████                                 | 280/468 [6:53:25<4:34:23, 87.57s/it]
2025-01-27 06:51:50 - ERROR - stderr -  60%|█████████████████████████████████████████████████▏                                | 281/468 [6:54:52<4:33:02, 87.61s/it]
2025-01-27 06:51:50 - ERROR - stderr - 
2025-01-27 06:51:50 - ERROR - stderr - 
2025-01-27 06:51:50 - INFO - stdout - {'loss': 0.4867, 'learning_rate': 1.0942161684420697e-05, 'epoch': 0.6}
2025-01-27 06:51:50 - ERROR - stderr -  60%|█████████████████████████████████████████████████▏                                | 281/468 [6:54:52<4:33:02, 87.61s/it]
2025-01-27 06:53:17 - ERROR - stderr -  60%|█████████████████████████████████████████████████▍                                | 282/468 [6:56:19<4:30:49, 87.36s/it]
2025-01-27 06:53:17 - ERROR - stderr - 
2025-01-27 06:53:17 - ERROR - stderr - 
2025-01-27 06:53:17 - INFO - stdout - {'loss': 0.4774, 'learning_rate': 1.0842112581987143e-05, 'epoch': 0.6}
2025-01-27 06:53:17 - ERROR - stderr -  60%|█████████████████████████████████████████████████▍                                | 282/468 [6:56:19<4:30:49, 87.36s/it]
2025-01-27 06:54:45 - ERROR - stderr -  60%|█████████████████████████████████████████████████▌                                | 283/468 [6:57:47<4:29:38, 87.45s/it]
2025-01-27 06:54:45 - ERROR - stderr - 
2025-01-27 06:54:45 - ERROR - stderr - 
2025-01-27 06:54:45 - INFO - stdout - {'loss': 0.4695, 'learning_rate': 1.0742263453918684e-05, 'epoch': 0.6}
2025-01-27 06:54:45 - ERROR - stderr -  60%|█████████████████████████████████████████████████▌                                | 283/468 [6:57:47<4:29:38, 87.45s/it]
2025-01-27 06:56:12 - ERROR - stderr -  61%|█████████████████████████████████████████████████▊                                | 284/468 [6:59:14<4:28:27, 87.54s/it]
2025-01-27 06:56:12 - ERROR - stderr - 
2025-01-27 06:56:12 - ERROR - stderr - 
2025-01-27 06:56:12 - INFO - stdout - {'loss': 0.4899, 'learning_rate': 1.0642619102477623e-05, 'epoch': 0.61}
2025-01-27 06:56:12 - ERROR - stderr -  61%|█████████████████████████████████████████████████▊                                | 284/468 [6:59:14<4:28:27, 87.54s/it]
2025-01-27 06:57:40 - ERROR - stderr -  61%|█████████████████████████████████████████████████▉                                | 285/468 [7:00:42<4:26:55, 87.52s/it]
2025-01-27 06:57:40 - ERROR - stderr - 
2025-01-27 06:57:40 - ERROR - stderr - 
2025-01-27 06:57:40 - INFO - stdout - {'loss': 0.4964, 'learning_rate': 1.054318432007749e-05, 'epoch': 0.61}
2025-01-27 06:57:40 - ERROR - stderr -  61%|█████████████████████████████████████████████████▉                                | 285/468 [7:00:42<4:26:55, 87.52s/it]
2025-01-27 06:59:07 - ERROR - stderr -  61%|██████████████████████████████████████████████████                                | 286/468 [7:02:09<4:25:00, 87.36s/it]
2025-01-27 06:59:07 - ERROR - stderr - 
2025-01-27 06:59:07 - ERROR - stderr - 
2025-01-27 06:59:07 - INFO - stdout - {'loss': 0.4747, 'learning_rate': 1.0443963889052553e-05, 'epoch': 0.61}
2025-01-27 06:59:07 - ERROR - stderr -  61%|██████████████████████████████████████████████████                                | 286/468 [7:02:09<4:25:00, 87.36s/it]
2025-01-27 07:00:35 - ERROR - stderr -  61%|██████████████████████████████████████████████████▎                               | 287/468 [7:03:37<4:24:20, 87.62s/it]
2025-01-27 07:00:35 - ERROR - stderr - 
2025-01-27 07:00:35 - ERROR - stderr - 
2025-01-27 07:00:35 - INFO - stdout - {'loss': 0.4979, 'learning_rate': 1.0344962581427802e-05, 'epoch': 0.61}
2025-01-27 07:00:35 - ERROR - stderr -  61%|██████████████████████████████████████████████████▎                               | 287/468 [7:03:37<4:24:20, 87.62s/it]
2025-01-27 07:02:03 - ERROR - stderr -  62%|██████████████████████████████████████████████████▍                               | 288/468 [7:05:05<4:23:00, 87.67s/it]
2025-01-27 07:02:03 - ERROR - stderr - 
2025-01-27 07:02:03 - ERROR - stderr - 
2025-01-27 07:02:03 - INFO - stdout - {'loss': 0.487, 'learning_rate': 1.024618515868946e-05, 'epoch': 0.61}
2025-01-27 07:02:03 - ERROR - stderr -  62%|██████████████████████████████████████████████████▍                               | 288/468 [7:05:05<4:23:00, 87.67s/it]
2025-01-27 07:03:30 - ERROR - stderr -  62%|██████████████████████████████████████████████████▋                               | 289/468 [7:06:32<4:21:29, 87.65s/it]
2025-01-27 07:03:30 - ERROR - stderr - 
2025-01-27 07:03:30 - ERROR - stderr - 
2025-01-27 07:03:30 - INFO - stdout - {'loss': 0.4785, 'learning_rate': 1.0147636371555964e-05, 'epoch': 0.62}
2025-01-27 07:03:30 - ERROR - stderr -  62%|██████████████████████████████████████████████████▋                               | 289/468 [7:06:32<4:21:29, 87.65s/it]
2025-01-27 07:04:58 - ERROR - stderr -  62%|██████████████████████████████████████████████████▊                               | 290/468 [7:08:00<4:19:54, 87.61s/it]
2025-01-27 07:04:58 - ERROR - stderr - 
2025-01-27 07:04:58 - ERROR - stderr - 
2025-01-27 07:04:58 - INFO - stdout - {'loss': 0.4875, 'learning_rate': 1.0049320959749467e-05, 'epoch': 0.62}
2025-01-27 07:04:58 - ERROR - stderr -  62%|██████████████████████████████████████████████████▊                               | 290/468 [7:08:00<4:19:54, 87.61s/it]
2025-01-27 07:06:26 - ERROR - stderr -  62%|██████████████████████████████████████████████████▉                               | 291/468 [7:09:28<4:18:36, 87.66s/it]
2025-01-27 07:06:26 - ERROR - stderr - 
2025-01-27 07:06:26 - ERROR - stderr - 
2025-01-27 07:06:26 - INFO - stdout - {'loss': 0.5079, 'learning_rate': 9.951243651767909e-06, 'epoch': 0.62}
2025-01-27 07:06:26 - ERROR - stderr -  62%|██████████████████████████████████████████████████▉                               | 291/468 [7:09:28<4:18:36, 87.66s/it]
2025-01-27 07:07:53 - ERROR - stderr -  62%|███████████████████████████████████████████████████▏                              | 292/468 [7:10:55<4:16:36, 87.48s/it]
2025-01-27 07:07:53 - ERROR - stderr - 
2025-01-27 07:07:53 - ERROR - stderr - 
2025-01-27 07:07:53 - INFO - stdout - {'loss': 0.4956, 'learning_rate': 9.853409164657566e-06, 'epoch': 0.62}
2025-01-27 07:07:53 - ERROR - stderr -  62%|███████████████████████████████████████████████████▏                              | 292/468 [7:10:55<4:16:36, 87.48s/it]
2025-01-27 07:09:20 - ERROR - stderr -  63%|███████████████████████████████████████████████████▎                              | 293/468 [7:12:22<4:15:07, 87.47s/it]
2025-01-27 07:09:20 - ERROR - stderr - 
2025-01-27 07:09:20 - ERROR - stderr - 
2025-01-27 07:09:20 - INFO - stdout - {'loss': 0.4936, 'learning_rate': 9.755822203786213e-06, 'epoch': 0.63}
2025-01-27 07:09:20 - ERROR - stderr -  63%|███████████████████████████████████████████████████▎                              | 293/468 [7:12:22<4:15:07, 87.47s/it]
2025-01-27 07:10:48 - ERROR - stderr -  63%|███████████████████████████████████████████████████▌                              | 294/468 [7:13:50<4:13:36, 87.45s/it]
2025-01-27 07:10:48 - ERROR - stderr - 
2025-01-27 07:10:48 - ERROR - stderr - 
2025-01-27 07:10:48 - INFO - stdout - {'loss': 0.4805, 'learning_rate': 9.658487462616795e-06, 'epoch': 0.63}
2025-01-27 07:10:48 - ERROR - stderr -  63%|███████████████████████████████████████████████████▌                              | 294/468 [7:13:50<4:13:36, 87.45s/it]
2025-01-27 07:12:16 - ERROR - stderr -  63%|███████████████████████████████████████████████████▋                              | 295/468 [7:15:18<4:12:31, 87.58s/it]
2025-01-27 07:12:16 - ERROR - stderr - 
2025-01-27 07:12:16 - ERROR - stderr - 
2025-01-27 07:12:16 - INFO - stdout - {'loss': 0.4746, 'learning_rate': 9.56140962248171e-06, 'epoch': 0.63}
2025-01-27 07:12:16 - ERROR - stderr -  63%|███████████████████████████████████████████████████▋                              | 295/468 [7:15:18<4:12:31, 87.58s/it]
2025-01-27 07:13:44 - ERROR - stderr -  63%|███████████████████████████████████████████████████▊                              | 296/468 [7:16:46<4:11:34, 87.76s/it]
2025-01-27 07:13:44 - ERROR - stderr - 
2025-01-27 07:13:44 - ERROR - stderr - 
2025-01-27 07:13:44 - INFO - stdout - {'loss': 0.4797, 'learning_rate': 9.464593352357654e-06, 'epoch': 0.63}
2025-01-27 07:13:44 - ERROR - stderr -  63%|███████████████████████████████████████████████████▊                              | 296/468 [7:16:46<4:11:34, 87.76s/it]
2025-01-27 07:15:12 - ERROR - stderr -  63%|████████████████████████████████████████████████████                              | 297/468 [7:18:14<4:10:18, 87.82s/it]
2025-01-27 07:15:12 - ERROR - stderr - 
2025-01-27 07:15:12 - ERROR - stderr - 
2025-01-27 07:15:12 - INFO - stdout - {'loss': 0.4793, 'learning_rate': 9.368043308641054e-06, 'epoch': 0.63}
2025-01-27 07:15:12 - ERROR - stderr -  63%|████████████████████████████████████████████████████                              | 297/468 [7:18:14<4:10:18, 87.82s/it]
2025-01-27 07:16:40 - ERROR - stderr -  64%|████████████████████████████████████████████████████▏                             | 298/468 [7:19:42<4:09:35, 88.09s/it]
2025-01-27 07:16:40 - ERROR - stderr - 
2025-01-27 07:16:40 - ERROR - stderr - 
2025-01-27 07:16:40 - INFO - stdout - {'loss': 0.4775, 'learning_rate': 9.27176413492414e-06, 'epoch': 0.64}
2025-01-27 07:16:40 - ERROR - stderr -  64%|████████████████████████████████████████████████████▏                             | 298/468 [7:19:42<4:09:35, 88.09s/it]
2025-01-27 07:18:09 - ERROR - stderr -  64%|████████████████████████████████████████████████████▍                             | 299/468 [7:21:11<4:08:12, 88.12s/it]
2025-01-27 07:18:09 - ERROR - stderr - 
2025-01-27 07:18:09 - ERROR - stderr - 
2025-01-27 07:18:09 - INFO - stdout - {'loss': 0.4923, 'learning_rate': 9.175760461771597e-06, 'epoch': 0.64}
2025-01-27 07:18:09 - ERROR - stderr -  64%|████████████████████████████████████████████████████▍                             | 299/468 [7:21:11<4:08:12, 88.12s/it]
2025-01-27 07:19:37 - ERROR - stderr -  64%|████████████████████████████████████████████████████▌                             | 300/468 [7:22:39<4:06:35, 88.07s/it]
2025-01-27 07:19:37 - ERROR - stderr - 
2025-01-27 07:19:37 - ERROR - stderr - 
2025-01-27 07:19:37 - INFO - stdout - {'loss': 0.4719, 'learning_rate': 9.080036906497855e-06, 'epoch': 0.64}
2025-01-27 07:19:37 - ERROR - stderr -  64%|████████████████████████████████████████████████████▌                             | 300/468 [7:22:39<4:06:35, 88.07s/it]
2025-01-27 07:19:38 - INFO - transformers.trainer - Saving model checkpoint to outputs/PointLLM_train_stage2/test_stage2/checkpoint-300
2025-01-27 07:19:38 - INFO - transformers.trainer - Saving model checkpoint to outputs/PointLLM_train_stage2/test_stage2/checkpoint-300
2025-01-27 07:19:38 - INFO - transformers.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/config.json
2025-01-27 07:19:38 - INFO - transformers.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/config.json
2025-01-27 07:19:38 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/generation_config.json
2025-01-27 07:19:38 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/generation_config.json
2025-01-27 07:20:06 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/pytorch_model.bin.index.json.
2025-01-27 07:20:06 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/pytorch_model.bin.index.json.
2025-01-27 07:20:06 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/tokenizer_config.json
2025-01-27 07:20:06 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/tokenizer_config.json
2025-01-27 07:20:06 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/special_tokens_map.json
2025-01-27 07:20:06 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/special_tokens_map.json
2025-01-27 07:20:06 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/added_tokens.json
2025-01-27 07:20:06 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/PointLLM_train_stage2/test_stage2/checkpoint-300/added_tokens.json
2025-01-27 07:20:38 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2025-01-27 07:20:38 - ERROR - stderr -   else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
2025-01-27 07:20:38 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
2025-01-27 07:20:38 - ERROR - stderr -   return fn(*args, **kwargs)
2025-01-27 07:22:06 - ERROR - stderr -  64%|████████████████████████████████████████████████████                             | 301/468 [7:25:08<4:55:57, 106.33s/it]
2025-01-27 07:22:06 - ERROR - stderr - 
2025-01-27 07:22:06 - ERROR - stderr - 
2025-01-27 07:22:06 - INFO - stdout - {'loss': 0.478, 'learning_rate': 8.984598072945029e-06, 'epoch': 0.64}
2025-01-27 07:22:06 - ERROR - stderr -  64%|████████████████████████████████████████████████████                             | 301/468 [7:25:08<4:55:57, 106.33s/it]
2025-01-27 07:23:33 - ERROR - stderr -  65%|████████████████████████████████████████████████████▎                            | 302/468 [7:26:35<4:38:40, 100.72s/it]
2025-01-27 07:23:33 - ERROR - stderr - 
2025-01-27 07:23:33 - ERROR - stderr - 
2025-01-27 07:23:33 - INFO - stdout - {'loss': 0.4788, 'learning_rate': 8.889448551261481e-06, 'epoch': 0.64}
2025-01-27 07:23:33 - ERROR - stderr -  65%|████████████████████████████████████████████████████▎                            | 302/468 [7:26:35<4:38:40, 100.72s/it]
2025-01-27 07:25:00 - ERROR - stderr -  65%|█████████████████████████████████████████████████████                             | 303/468 [7:28:02<4:25:45, 96.64s/it]
2025-01-27 07:25:00 - ERROR - stderr - 
2025-01-27 07:25:00 - ERROR - stderr - 
2025-01-27 07:25:00 - INFO - stdout - {'loss': 0.4825, 'learning_rate': 8.794592917681079e-06, 'epoch': 0.65}
2025-01-27 07:25:00 - ERROR - stderr -  65%|█████████████████████████████████████████████████████                             | 303/468 [7:28:02<4:25:45, 96.64s/it]
2025-01-27 07:26:27 - ERROR - stderr -  65%|█████████████████████████████████████████████████████▎                            | 304/468 [7:29:29<4:16:14, 93.75s/it]
2025-01-27 07:26:27 - ERROR - stderr - 
2025-01-27 07:26:27 - ERROR - stderr - 
2025-01-27 07:26:27 - INFO - stdout - {'loss': 0.491, 'learning_rate': 8.700035734303074e-06, 'epoch': 0.65}
2025-01-27 07:26:27 - ERROR - stderr -  65%|█████████████████████████████████████████████████████▎                            | 304/468 [7:29:29<4:16:14, 93.75s/it]
2025-01-27 07:27:54 - ERROR - stderr -  65%|█████████████████████████████████████████████████████▍                            | 305/468 [7:30:56<4:08:53, 91.62s/it]
2025-01-27 07:27:54 - ERROR - stderr - 
2025-01-27 07:27:54 - ERROR - stderr - 
2025-01-27 07:27:54 - INFO - stdout - {'loss': 0.5009, 'learning_rate': 8.605781548872711e-06, 'epoch': 0.65}
2025-01-27 07:27:54 - ERROR - stderr -  65%|█████████████████████████████████████████████████████▍                            | 305/468 [7:30:56<4:08:53, 91.62s/it]
2025-01-27 07:29:21 - ERROR - stderr -  65%|█████████████████████████████████████████████████████▌                            | 306/468 [7:32:23<4:03:49, 90.30s/it]
2025-01-27 07:29:21 - ERROR - stderr - 
2025-01-27 07:29:21 - ERROR - stderr - 
2025-01-27 07:29:21 - INFO - stdout - {'loss': 0.4684, 'learning_rate': 8.511834894562488e-06, 'epoch': 0.65}
2025-01-27 07:29:21 - ERROR - stderr -  65%|█████████████████████████████████████████████████████▌                            | 306/468 [7:32:23<4:03:49, 90.30s/it]
2025-01-27 07:30:49 - ERROR - stderr -  66%|█████████████████████████████████████████████████████▊                            | 307/468 [7:33:51<4:00:22, 89.58s/it]
2025-01-27 07:30:49 - ERROR - stderr - 
2025-01-27 07:30:49 - ERROR - stderr - 
2025-01-27 07:30:49 - INFO - stdout - {'loss': 0.4667, 'learning_rate': 8.418200289754128e-06, 'epoch': 0.65}
2025-01-27 07:30:49 - ERROR - stderr -  66%|█████████████████████████████████████████████████████▊                            | 307/468 [7:33:51<4:00:22, 89.58s/it]
2025-01-27 07:32:16 - ERROR - stderr -  66%|█████████████████████████████████████████████████████▉                            | 308/468 [7:35:18<3:56:37, 88.73s/it]
2025-01-27 07:32:16 - ERROR - stderr - 
2025-01-27 07:32:16 - ERROR - stderr - 
2025-01-27 07:32:16 - INFO - stdout - {'loss': 0.4856, 'learning_rate': 8.324882237821284e-06, 'epoch': 0.66}
2025-01-27 07:32:16 - ERROR - stderr -  66%|█████████████████████████████████████████████████████▉                            | 308/468 [7:35:18<3:56:37, 88.73s/it]
2025-01-27 07:33:43 - ERROR - stderr -  66%|██████████████████████████████████████████████████████▏                           | 309/468 [7:36:45<3:54:18, 88.42s/it]
2025-01-27 07:33:43 - ERROR - stderr - 
2025-01-27 07:33:43 - ERROR - stderr - 
2025-01-27 07:33:43 - INFO - stdout - {'loss': 0.4908, 'learning_rate': 8.231885226912944e-06, 'epoch': 0.66}
2025-01-27 07:33:43 - ERROR - stderr -  66%|██████████████████████████████████████████████████████▏                           | 309/468 [7:36:45<3:54:18, 88.42s/it]
2025-01-27 07:35:11 - ERROR - stderr -  66%|██████████████████████████████████████████████████████▎                           | 310/468 [7:38:13<3:52:26, 88.27s/it]
2025-01-27 07:35:11 - ERROR - stderr - 
2025-01-27 07:35:11 - ERROR - stderr - 
2025-01-27 07:35:11 - INFO - stdout - {'loss': 0.4947, 'learning_rate': 8.139213729737562e-06, 'epoch': 0.66}
2025-01-27 07:35:11 - ERROR - stderr -  66%|██████████████████████████████████████████████████████▎                           | 310/468 [7:38:13<3:52:26, 88.27s/it]
2025-01-27 07:36:39 - ERROR - stderr -  66%|██████████████████████████████████████████████████████▍                           | 311/468 [7:39:41<3:50:48, 88.21s/it]
2025-01-27 07:36:39 - ERROR - stderr - 
2025-01-27 07:36:39 - ERROR - stderr - 
2025-01-27 07:36:39 - INFO - stdout - {'loss': 0.4788, 'learning_rate': 8.046872203347943e-06, 'epoch': 0.66}
2025-01-27 07:36:39 - ERROR - stderr -  66%|██████████████████████████████████████████████████████▍                           | 311/468 [7:39:41<3:50:48, 88.21s/it]
2025-01-27 07:38:08 - ERROR - stderr -  67%|██████████████████████████████████████████████████████▋                           | 312/468 [7:41:10<3:49:15, 88.18s/it]
2025-01-27 07:38:08 - ERROR - stderr - 
2025-01-27 07:38:08 - ERROR - stderr - 
2025-01-27 07:38:08 - INFO - stdout - {'loss': 0.4785, 'learning_rate': 7.95486508892691e-06, 'epoch': 0.67}
2025-01-27 07:38:08 - ERROR - stderr -  67%|██████████████████████████████████████████████████████▋                           | 312/468 [7:41:10<3:49:15, 88.18s/it]
2025-01-27 07:39:36 - ERROR - stderr -  67%|██████████████████████████████████████████████████████▊                           | 313/468 [7:42:38<3:47:56, 88.24s/it]
2025-01-27 07:39:36 - ERROR - stderr - 
2025-01-27 07:39:36 - ERROR - stderr - 
2025-01-27 07:39:36 - INFO - stdout - {'loss': 0.4669, 'learning_rate': 7.863196811573651e-06, 'epoch': 0.67}
2025-01-27 07:39:36 - ERROR - stderr -  67%|██████████████████████████████████████████████████████▊                           | 313/468 [7:42:38<3:47:56, 88.24s/it]
2025-01-27 07:41:03 - ERROR - stderr -  67%|███████████████████████████████████████████████████████                           | 314/468 [7:44:05<3:45:43, 87.95s/it]
2025-01-27 07:41:03 - ERROR - stderr - 
2025-01-27 07:41:03 - ERROR - stderr - 
2025-01-27 07:41:03 - INFO - stdout - {'loss': 0.4988, 'learning_rate': 7.771871780090942e-06, 'epoch': 0.67}
2025-01-27 07:41:03 - ERROR - stderr -  67%|███████████████████████████████████████████████████████                           | 314/468 [7:44:05<3:45:43, 87.95s/it]
2025-01-27 07:42:31 - ERROR - stderr -  67%|███████████████████████████████████████████████████████▏                          | 315/468 [7:45:33<3:43:50, 87.78s/it]
2025-01-27 07:42:31 - ERROR - stderr - 
2025-01-27 07:42:31 - ERROR - stderr - 
2025-01-27 07:42:31 - INFO - stdout - {'loss': 0.4925, 'learning_rate': 7.680894386773074e-06, 'epoch': 0.67}
2025-01-27 07:42:31 - ERROR - stderr -  67%|███████████████████████████████████████████████████████▏                          | 315/468 [7:45:33<3:43:50, 87.78s/it]
2025-01-27 07:43:59 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▎                          | 316/468 [7:47:01<3:42:30, 87.83s/it]
2025-01-27 07:43:59 - ERROR - stderr - 
2025-01-27 07:43:59 - ERROR - stderr - 
2025-01-27 07:43:59 - INFO - stdout - {'loss': 0.4828, 'learning_rate': 7.59026900719463e-06, 'epoch': 0.67}
2025-01-27 07:43:59 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▎                          | 316/468 [7:47:01<3:42:30, 87.83s/it]
2025-01-27 07:45:27 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▌                          | 317/468 [7:48:29<3:41:45, 88.12s/it]
2025-01-27 07:45:27 - ERROR - stderr - 
2025-01-27 07:45:27 - ERROR - stderr - 
2025-01-27 07:45:27 - INFO - stdout - {'loss': 0.4891, 'learning_rate': 7.500000000000004e-06, 'epoch': 0.68}
2025-01-27 07:45:27 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▌                          | 317/468 [7:48:29<3:41:45, 88.12s/it]
2025-01-27 07:46:55 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▋                          | 318/468 [7:49:57<3:40:07, 88.05s/it]
2025-01-27 07:46:55 - ERROR - stderr - 
2025-01-27 07:46:55 - ERROR - stderr - 
2025-01-27 07:46:55 - INFO - stdout - {'loss': 0.4688, 'learning_rate': 7.410091706693814e-06, 'epoch': 0.68}
2025-01-27 07:46:55 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▋                          | 318/468 [7:49:57<3:40:07, 88.05s/it]
2025-01-27 07:48:24 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▉                          | 319/468 [7:51:26<3:39:11, 88.26s/it]
2025-01-27 07:48:24 - ERROR - stderr - 
2025-01-27 07:48:24 - ERROR - stderr - 
2025-01-27 07:48:24 - INFO - stdout - {'loss': 0.4708, 'learning_rate': 7.320548451432072e-06, 'epoch': 0.68}
2025-01-27 07:48:24 - ERROR - stderr -  68%|███████████████████████████████████████████████████████▉                          | 319/468 [7:51:26<3:39:11, 88.26s/it]
2025-01-27 07:49:54 - ERROR - stderr -  68%|████████████████████████████████████████████████████████                          | 320/468 [7:52:56<3:39:09, 88.85s/it]
2025-01-27 07:49:54 - ERROR - stderr - 
2025-01-27 07:49:54 - ERROR - stderr - 
2025-01-27 07:49:54 - INFO - stdout - {'loss': 0.4664, 'learning_rate': 7.231374540814215e-06, 'epoch': 0.68}
2025-01-27 07:49:54 - ERROR - stderr -  68%|████████████████████████████████████████████████████████                          | 320/468 [7:52:56<3:39:09, 88.85s/it]
2025-01-27 07:51:24 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▏                         | 321/468 [7:54:26<3:38:14, 89.08s/it]
2025-01-27 07:51:24 - ERROR - stderr - 
2025-01-27 07:51:24 - ERROR - stderr - 
2025-01-27 07:51:24 - INFO - stdout - {'loss': 0.4917, 'learning_rate': 7.1425742636759835e-06, 'epoch': 0.68}
2025-01-27 07:51:24 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▏                         | 321/468 [7:54:26<3:38:14, 89.08s/it]
2025-01-27 07:52:53 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▍                         | 322/468 [7:55:55<3:37:03, 89.20s/it]
2025-01-27 07:52:53 - ERROR - stderr - 
2025-01-27 07:52:53 - ERROR - stderr - 
2025-01-27 07:52:53 - INFO - stdout - {'loss': 0.485, 'learning_rate': 7.054151890883147e-06, 'epoch': 0.69}
2025-01-27 07:52:53 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▍                         | 322/468 [7:55:55<3:37:03, 89.20s/it]
2025-01-27 07:54:22 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▌                         | 323/468 [7:57:24<3:35:17, 89.09s/it]
2025-01-27 07:54:22 - ERROR - stderr - 
2025-01-27 07:54:22 - ERROR - stderr - 
2025-01-27 07:54:22 - INFO - stdout - {'loss': 0.4848, 'learning_rate': 6.96611167512609e-06, 'epoch': 0.69}
2025-01-27 07:54:22 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▌                         | 323/468 [7:57:24<3:35:17, 89.09s/it]
2025-01-27 07:55:51 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▊                         | 324/468 [7:58:53<3:33:20, 88.89s/it]
2025-01-27 07:55:51 - ERROR - stderr - 
2025-01-27 07:55:51 - ERROR - stderr - 
2025-01-27 07:55:51 - INFO - stdout - {'loss': 0.4797, 'learning_rate': 6.878457850715293e-06, 'epoch': 0.69}
2025-01-27 07:55:51 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▊                         | 324/468 [7:58:53<3:33:20, 88.89s/it]
2025-01-27 07:57:20 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▉                         | 325/468 [8:00:22<3:31:54, 88.91s/it]
2025-01-27 07:57:20 - ERROR - stderr - 
2025-01-27 07:57:20 - ERROR - stderr - 
2025-01-27 07:57:20 - INFO - stdout - {'loss': 0.4794, 'learning_rate': 6.791194633377658e-06, 'epoch': 0.69}
2025-01-27 07:57:20 - ERROR - stderr -  69%|████████████████████████████████████████████████████████▉                         | 325/468 [8:00:22<3:31:54, 88.91s/it]
2025-01-27 07:58:49 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████                         | 326/468 [8:01:51<3:30:29, 88.94s/it]
2025-01-27 07:58:49 - ERROR - stderr - 
2025-01-27 07:58:49 - ERROR - stderr - 
2025-01-27 07:58:49 - INFO - stdout - {'loss': 0.4867, 'learning_rate': 6.704326220053796e-06, 'epoch': 0.7}
2025-01-27 07:58:49 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████                         | 326/468 [8:01:51<3:30:29, 88.94s/it]
2025-01-27 08:00:18 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████▎                        | 327/468 [8:03:20<3:29:05, 88.97s/it]
2025-01-27 08:00:18 - ERROR - stderr - 
2025-01-27 08:00:18 - ERROR - stderr - 
2025-01-27 08:00:18 - INFO - stdout - {'loss': 0.5003, 'learning_rate': 6.617856788696111e-06, 'epoch': 0.7}
2025-01-27 08:00:18 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████▎                        | 327/468 [8:03:20<3:29:05, 88.97s/it]
2025-01-27 08:01:46 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████▍                        | 328/468 [8:04:48<3:27:23, 88.88s/it]
2025-01-27 08:01:46 - ERROR - stderr - 
2025-01-27 08:01:46 - ERROR - stderr - 
2025-01-27 08:01:46 - INFO - stdout - {'loss': 0.4704, 'learning_rate': 6.5317904980679176e-06, 'epoch': 0.7}
2025-01-27 08:01:46 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████▍                        | 328/468 [8:04:48<3:27:23, 88.88s/it]
2025-01-27 08:03:16 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████▋                        | 329/468 [8:06:18<3:26:19, 89.06s/it]
2025-01-27 08:03:16 - ERROR - stderr - 
2025-01-27 08:03:16 - ERROR - stderr - 
2025-01-27 08:03:16 - INFO - stdout - {'loss': 0.4849, 'learning_rate': 6.446131487543397e-06, 'epoch': 0.7}
2025-01-27 08:03:16 - ERROR - stderr -  70%|█████████████████████████████████████████████████████████▋                        | 329/468 [8:06:18<3:26:19, 89.06s/it]
2025-01-27 08:04:46 - ERROR - stderr -  71%|█████████████████████████████████████████████████████████▊                        | 330/468 [8:07:48<3:25:23, 89.30s/it]
2025-01-27 08:04:46 - ERROR - stderr - 
2025-01-27 08:04:46 - ERROR - stderr - 
2025-01-27 08:04:46 - INFO - stdout - {'loss': 0.4706, 'learning_rate': 6.360883876908513e-06, 'epoch': 0.7}
2025-01-27 08:04:46 - ERROR - stderr -  71%|█████████████████████████████████████████████████████████▊                        | 330/468 [8:07:48<3:25:23, 89.30s/it]
2025-01-27 08:06:15 - ERROR - stderr -  71%|█████████████████████████████████████████████████████████▉                        | 331/468 [8:09:17<3:23:38, 89.19s/it]
2025-01-27 08:06:15 - ERROR - stderr - 
2025-01-27 08:06:15 - ERROR - stderr - 
2025-01-27 08:06:15 - INFO - stdout - {'loss': 0.4791, 'learning_rate': 6.276051766162887e-06, 'epoch': 0.71}
2025-01-27 08:06:15 - ERROR - stderr -  71%|█████████████████████████████████████████████████████████▉                        | 331/468 [8:09:17<3:23:38, 89.19s/it]
2025-01-27 08:07:43 - ERROR - stderr -  71%|██████████████████████████████████████████████████████████▏                       | 332/468 [8:10:45<3:21:52, 89.06s/it]
2025-01-27 08:07:43 - ERROR - stderr - 
2025-01-27 08:07:43 - ERROR - stderr - 
2025-01-27 08:07:43 - INFO - stdout - {'loss': 0.485, 'learning_rate': 6.191639235322586e-06, 'epoch': 0.71}
2025-01-27 08:07:43 - ERROR - stderr -  71%|██████████████████████████████████████████████████████████▏                       | 332/468 [8:10:45<3:21:52, 89.06s/it]
2025-01-27 08:09:12 - ERROR - stderr -  71%|██████████████████████████████████████████████████████████▎                       | 333/468 [8:12:14<3:19:53, 88.84s/it]
2025-01-27 08:09:12 - ERROR - stderr - 
2025-01-27 08:09:12 - ERROR - stderr - 
2025-01-27 08:09:12 - INFO - stdout - {'loss': 0.5052, 'learning_rate': 6.107650344223909e-06, 'epoch': 0.71}
2025-01-27 08:09:12 - ERROR - stderr -  71%|██████████████████████████████████████████████████████████▎                       | 333/468 [8:12:14<3:19:53, 88.84s/it]
2025-01-27 08:10:40 - ERROR - stderr -  71%|██████████████████████████████████████████████████████████▌                       | 334/468 [8:13:42<3:18:21, 88.82s/it]
2025-01-27 08:10:40 - ERROR - stderr - 
2025-01-27 08:10:40 - ERROR - stderr - 
2025-01-27 08:10:40 - INFO - stdout - {'loss': 0.4835, 'learning_rate': 6.024089132328122e-06, 'epoch': 0.71}
2025-01-27 08:10:40 - ERROR - stderr -  71%|██████████████████████████████████████████████████████████▌                       | 334/468 [8:13:42<3:18:21, 88.82s/it]
2025-01-27 08:12:09 - ERROR - stderr -  72%|██████████████████████████████████████████████████████████▋                       | 335/468 [8:15:11<3:16:42, 88.74s/it]
2025-01-27 08:12:09 - ERROR - stderr - 
2025-01-27 08:12:09 - ERROR - stderr - 
2025-01-27 08:12:09 - INFO - stdout - {'loss': 0.4718, 'learning_rate': 5.940959618527182e-06, 'epoch': 0.71}
2025-01-27 08:12:09 - ERROR - stderr -  72%|██████████████████████████████████████████████████████████▋                       | 335/468 [8:15:11<3:16:42, 88.74s/it]
2025-01-27 08:13:39 - ERROR - stderr -  72%|██████████████████████████████████████████████████████████▊                       | 336/468 [8:16:41<3:15:59, 89.09s/it]
2025-01-27 08:13:39 - ERROR - stderr - 
2025-01-27 08:13:39 - ERROR - stderr - 
2025-01-27 08:13:39 - INFO - stdout - {'loss': 0.4741, 'learning_rate': 5.8582658009504395e-06, 'epoch': 0.72}
2025-01-27 08:13:39 - ERROR - stderr -  72%|██████████████████████████████████████████████████████████▊                       | 336/468 [8:16:41<3:15:59, 89.09s/it]
2025-01-27 08:15:07 - ERROR - stderr -  72%|███████████████████████████████████████████████████████████                       | 337/468 [8:18:09<3:14:03, 88.88s/it]
2025-01-27 08:15:07 - ERROR - stderr - 
2025-01-27 08:15:07 - ERROR - stderr - 
2025-01-27 08:15:07 - INFO - stdout - {'loss': 0.4605, 'learning_rate': 5.776011656772364e-06, 'epoch': 0.72}
2025-01-27 08:15:07 - ERROR - stderr -  72%|███████████████████████████████████████████████████████████                       | 337/468 [8:18:09<3:14:03, 88.88s/it]
2025-01-27 08:16:36 - ERROR - stderr -  72%|███████████████████████████████████████████████████████████▏                      | 338/468 [8:19:38<3:12:17, 88.75s/it]
2025-01-27 08:16:36 - ERROR - stderr - 
2025-01-27 08:16:36 - ERROR - stderr - 
2025-01-27 08:16:36 - INFO - stdout - {'loss': 0.4758, 'learning_rate': 5.694201142021245e-06, 'epoch': 0.72}
2025-01-27 08:16:36 - ERROR - stderr -  72%|███████████████████████████████████████████████████████████▏                      | 338/468 [8:19:38<3:12:17, 88.75s/it]
2025-01-27 08:18:04 - ERROR - stderr -  72%|███████████████████████████████████████████████████████████▍                      | 339/468 [8:21:06<3:10:33, 88.63s/it]
2025-01-27 08:18:04 - ERROR - stderr - 
2025-01-27 08:18:04 - ERROR - stderr - 
2025-01-27 08:18:04 - INFO - stdout - {'loss': 0.494, 'learning_rate': 5.612838191388936e-06, 'epoch': 0.72}
2025-01-27 08:18:04 - ERROR - stderr -  72%|███████████████████████████████████████████████████████████▍                      | 339/468 [8:21:06<3:10:33, 88.63s/it]
2025-01-27 08:19:33 - ERROR - stderr -  73%|███████████████████████████████████████████████████████████▌                      | 340/468 [8:22:35<3:09:03, 88.62s/it]
2025-01-27 08:19:33 - ERROR - stderr - 
2025-01-27 08:19:33 - ERROR - stderr - 
2025-01-27 08:19:33 - INFO - stdout - {'loss': 0.4833, 'learning_rate': 5.531926718041609e-06, 'epoch': 0.73}
2025-01-27 08:19:33 - ERROR - stderr -  73%|███████████████████████████████████████████████████████████▌                      | 340/468 [8:22:35<3:09:03, 88.62s/it]
2025-01-27 08:21:02 - ERROR - stderr -  73%|███████████████████████████████████████████████████████████▋                      | 341/468 [8:24:04<3:07:56, 88.79s/it]
2025-01-27 08:21:02 - ERROR - stderr - 
2025-01-27 08:21:02 - ERROR - stderr - 
2025-01-27 08:21:02 - INFO - stdout - {'loss': 0.4759, 'learning_rate': 5.451470613431554e-06, 'epoch': 0.73}
2025-01-27 08:21:02 - ERROR - stderr -  73%|███████████████████████████████████████████████████████████▋                      | 341/468 [8:24:04<3:07:56, 88.79s/it]
2025-01-27 08:22:31 - ERROR - stderr -  73%|███████████████████████████████████████████████████████████▉                      | 342/468 [8:25:33<3:06:38, 88.87s/it]
2025-01-27 08:22:31 - ERROR - stderr - 
2025-01-27 08:22:31 - ERROR - stderr - 
2025-01-27 08:22:31 - INFO - stdout - {'loss': 0.4713, 'learning_rate': 5.371473747110016e-06, 'epoch': 0.73}
2025-01-27 08:22:31 - ERROR - stderr -  73%|███████████████████████████████████████████████████████████▉                      | 342/468 [8:25:33<3:06:38, 88.87s/it]
2025-01-27 08:24:01 - ERROR - stderr -  73%|████████████████████████████████████████████████████████████                      | 343/468 [8:27:03<3:05:42, 89.14s/it]
2025-01-27 08:24:01 - ERROR - stderr - 
2025-01-27 08:24:01 - ERROR - stderr - 
2025-01-27 08:24:01 - INFO - stdout - {'loss': 0.4675, 'learning_rate': 5.291939966541091e-06, 'epoch': 0.73}
2025-01-27 08:24:01 - ERROR - stderr -  73%|████████████████████████████████████████████████████████████                      | 343/468 [8:27:03<3:05:42, 89.14s/it]
2025-01-27 08:25:28 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▎                     | 344/468 [8:28:30<3:03:22, 88.73s/it]
2025-01-27 08:25:29 - ERROR - stderr - 
2025-01-27 08:25:29 - ERROR - stderr - 
2025-01-27 08:25:29 - INFO - stdout - {'loss': 0.4797, 'learning_rate': 5.212873096916677e-06, 'epoch': 0.73}
2025-01-27 08:25:29 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▎                     | 344/468 [8:28:31<3:03:22, 88.73s/it]
2025-01-27 08:26:55 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▍                     | 345/468 [8:29:57<3:00:46, 88.18s/it]
2025-01-27 08:26:55 - ERROR - stderr - 
2025-01-27 08:26:55 - ERROR - stderr - 
2025-01-27 08:26:55 - INFO - stdout - {'loss': 0.4722, 'learning_rate': 5.134276940972514e-06, 'epoch': 0.74}
2025-01-27 08:26:55 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▍                     | 345/468 [8:29:57<3:00:46, 88.18s/it]
2025-01-27 08:28:23 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▌                     | 346/468 [8:31:25<2:59:06, 88.08s/it]
2025-01-27 08:28:23 - ERROR - stderr - 
2025-01-27 08:28:23 - ERROR - stderr - 
2025-01-27 08:28:23 - INFO - stdout - {'loss': 0.4626, 'learning_rate': 5.05615527880526e-06, 'epoch': 0.74}
2025-01-27 08:28:23 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▌                     | 346/468 [8:31:25<2:59:06, 88.08s/it]
2025-01-27 08:29:50 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▊                     | 347/468 [8:32:52<2:57:06, 87.83s/it]
2025-01-27 08:29:50 - ERROR - stderr - 
2025-01-27 08:29:50 - ERROR - stderr - 
2025-01-27 08:29:50 - INFO - stdout - {'loss': 0.5055, 'learning_rate': 4.978511867690725e-06, 'epoch': 0.74}
2025-01-27 08:29:50 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▊                     | 347/468 [8:32:52<2:57:06, 87.83s/it]
2025-01-27 08:31:17 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▉                     | 348/468 [8:34:19<2:55:11, 87.59s/it]
2025-01-27 08:31:17 - ERROR - stderr - 
2025-01-27 08:31:17 - ERROR - stderr - 
2025-01-27 08:31:17 - INFO - stdout - {'loss': 0.4758, 'learning_rate': 4.901350441903135e-06, 'epoch': 0.74}
2025-01-27 08:31:17 - ERROR - stderr -  74%|████████████████████████████████████████████████████████████▉                     | 348/468 [8:34:19<2:55:11, 87.59s/it]
2025-01-27 08:32:44 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▏                    | 349/468 [8:35:46<2:53:05, 87.28s/it]
2025-01-27 08:32:44 - ERROR - stderr - 
2025-01-27 08:32:44 - ERROR - stderr - 
2025-01-27 08:32:44 - INFO - stdout - {'loss': 0.4899, 'learning_rate': 4.824674712535559e-06, 'epoch': 0.74}
2025-01-27 08:32:44 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▏                    | 349/468 [8:35:46<2:53:05, 87.28s/it]
2025-01-27 08:34:11 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▎                    | 350/468 [8:37:13<2:51:41, 87.31s/it]
2025-01-27 08:34:11 - ERROR - stderr - 
2025-01-27 08:34:11 - ERROR - stderr - 
2025-01-27 08:34:11 - INFO - stdout - {'loss': 0.4975, 'learning_rate': 4.748488367321388e-06, 'epoch': 0.75}
2025-01-27 08:34:11 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▎                    | 350/468 [8:37:13<2:51:41, 87.31s/it]
2025-01-27 08:35:38 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▌                    | 351/468 [8:38:40<2:49:34, 86.96s/it]
2025-01-27 08:35:38 - ERROR - stderr - 
2025-01-27 08:35:38 - ERROR - stderr - 
2025-01-27 08:35:38 - INFO - stdout - {'loss': 0.4901, 'learning_rate': 4.672795070457e-06, 'epoch': 0.75}
2025-01-27 08:35:38 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▌                    | 351/468 [8:38:40<2:49:34, 86.96s/it]
2025-01-27 08:37:04 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▋                    | 352/468 [8:40:06<2:47:54, 86.85s/it]
2025-01-27 08:37:04 - ERROR - stderr - 
2025-01-27 08:37:04 - ERROR - stderr - 
2025-01-27 08:37:04 - INFO - stdout - {'loss': 0.4809, 'learning_rate': 4.597598462425523e-06, 'epoch': 0.75}
2025-01-27 08:37:04 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▋                    | 352/468 [8:40:06<2:47:54, 86.85s/it]
2025-01-27 08:38:30 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▊                    | 353/468 [8:41:32<2:46:08, 86.68s/it]
2025-01-27 08:38:30 - ERROR - stderr - 
2025-01-27 08:38:30 - ERROR - stderr - 
2025-01-27 08:38:30 - INFO - stdout - {'loss': 0.4712, 'learning_rate': 4.522902159821737e-06, 'epoch': 0.75}
2025-01-27 08:38:30 - ERROR - stderr -  75%|█████████████████████████████████████████████████████████████▊                    | 353/468 [8:41:32<2:46:08, 86.68s/it]
2025-01-27 08:39:57 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████                    | 354/468 [8:42:59<2:44:48, 86.74s/it]
2025-01-27 08:39:57 - ERROR - stderr - 
2025-01-27 08:39:57 - ERROR - stderr - 
2025-01-27 08:39:57 - INFO - stdout - {'loss': 0.4807, 'learning_rate': 4.4487097551781465e-06, 'epoch': 0.76}
2025-01-27 08:39:57 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████                    | 354/468 [8:42:59<2:44:48, 86.74s/it]
2025-01-27 08:41:24 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▏                   | 355/468 [8:44:26<2:43:36, 86.87s/it]
2025-01-27 08:41:25 - ERROR - stderr - 
2025-01-27 08:41:25 - ERROR - stderr - 
2025-01-27 08:41:25 - INFO - stdout - {'loss': 0.4743, 'learning_rate': 4.3750248167921925e-06, 'epoch': 0.76}
2025-01-27 08:41:25 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▏                   | 355/468 [8:44:27<2:43:36, 86.87s/it]
2025-01-27 08:42:52 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▍                   | 356/468 [8:45:54<2:42:40, 87.15s/it]
2025-01-27 08:42:52 - ERROR - stderr - 
2025-01-27 08:42:52 - ERROR - stderr - 
2025-01-27 08:42:52 - INFO - stdout - {'loss': 0.4886, 'learning_rate': 4.301850888554617e-06, 'epoch': 0.76}
2025-01-27 08:42:52 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▍                   | 356/468 [8:45:54<2:42:40, 87.15s/it]
2025-01-27 08:44:21 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▌                   | 357/468 [8:47:23<2:41:59, 87.56s/it]
2025-01-27 08:44:21 - ERROR - stderr - 
2025-01-27 08:44:21 - ERROR - stderr - 
2025-01-27 08:44:21 - INFO - stdout - {'loss': 0.4991, 'learning_rate': 4.229191489779048e-06, 'epoch': 0.76}
2025-01-27 08:44:21 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▌                   | 357/468 [8:47:23<2:41:59, 87.56s/it]
2025-01-27 08:45:47 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▋                   | 358/468 [8:48:49<2:40:00, 87.28s/it]
2025-01-27 08:45:48 - ERROR - stderr - 
2025-01-27 08:45:48 - ERROR - stderr - 
2025-01-27 08:45:48 - INFO - stdout - {'loss': 0.4873, 'learning_rate': 4.157050115032724e-06, 'epoch': 0.76}
2025-01-27 08:45:48 - ERROR - stderr -  76%|██████████████████████████████████████████████████████████████▋                   | 358/468 [8:48:50<2:40:00, 87.28s/it]
2025-01-27 08:47:15 - ERROR - stderr -  77%|██████████████████████████████████████████████████████████████▉                   | 359/468 [8:50:17<2:38:46, 87.40s/it]
2025-01-27 08:47:15 - ERROR - stderr - 
2025-01-27 08:47:15 - ERROR - stderr - 
2025-01-27 08:47:15 - INFO - stdout - {'loss': 0.4837, 'learning_rate': 4.085430233968418e-06, 'epoch': 0.77}
2025-01-27 08:47:15 - ERROR - stderr -  77%|██████████████████████████████████████████████████████████████▉                   | 359/468 [8:50:17<2:38:46, 87.40s/it]
2025-01-27 08:48:42 - ERROR - stderr -  77%|███████████████████████████████████████████████████████████████                   | 360/468 [8:51:44<2:36:56, 87.19s/it]
2025-01-27 08:48:42 - ERROR - stderr - 
2025-01-27 08:48:42 - ERROR - stderr - 
2025-01-27 08:48:42 - INFO - stdout - {'loss': 0.4655, 'learning_rate': 4.014335291157566e-06, 'epoch': 0.77}
2025-01-27 08:48:42 - ERROR - stderr -  77%|███████████████████████████████████████████████████████████████                   | 360/468 [8:51:44<2:36:56, 87.19s/it]
2025-01-27 08:50:09 - ERROR - stderr -  77%|███████████████████████████████████████████████████████████████▎                  | 361/468 [8:53:11<2:35:40, 87.29s/it]
2025-01-27 08:50:09 - ERROR - stderr - 
2025-01-27 08:50:09 - ERROR - stderr - 
2025-01-27 08:50:09 - INFO - stdout - {'loss': 0.4809, 'learning_rate': 3.9437687059246065e-06, 'epoch': 0.77}
2025-01-27 08:50:09 - ERROR - stderr -  77%|███████████████████████████████████████████████████████████████▎                  | 361/468 [8:53:11<2:35:40, 87.29s/it]
2025-01-27 08:51:36 - ERROR - stderr -  77%|███████████████████████████████████████████████████████████████▍                  | 362/468 [8:54:38<2:33:53, 87.11s/it]
2025-01-27 08:51:36 - ERROR - stderr - 
2025-01-27 08:51:36 - ERROR - stderr - 
2025-01-27 08:51:36 - INFO - stdout - {'loss': 0.4875, 'learning_rate': 3.873733872182515e-06, 'epoch': 0.77}
2025-01-27 08:51:36 - ERROR - stderr -  77%|███████████████████████████████████████████████████████████████▍                  | 362/468 [8:54:38<2:33:53, 87.11s/it]
2025-01-27 08:53:03 - ERROR - stderr -  78%|███████████████████████████████████████████████████████████████▌                  | 363/468 [8:56:05<2:32:25, 87.10s/it]
2025-01-27 08:53:03 - ERROR - stderr - 
2025-01-27 08:53:03 - ERROR - stderr - 
2025-01-27 08:53:03 - INFO - stdout - {'loss': 0.4705, 'learning_rate': 3.804234158269593e-06, 'epoch': 0.77}
2025-01-27 08:53:03 - ERROR - stderr -  78%|███████████████████████████████████████████████████████████████▌                  | 363/468 [8:56:05<2:32:25, 87.10s/it]
2025-01-27 08:54:30 - ERROR - stderr -  78%|███████████████████████████████████████████████████████████████▊                  | 364/468 [8:57:32<2:30:40, 86.92s/it]
2025-01-27 08:54:30 - ERROR - stderr - 
2025-01-27 08:54:30 - ERROR - stderr - 
2025-01-27 08:54:30 - INFO - stdout - {'loss': 0.4761, 'learning_rate': 3.7352729067874436e-06, 'epoch': 0.78}
2025-01-27 08:54:30 - ERROR - stderr -  78%|███████████████████████████████████████████████████████████████▊                  | 364/468 [8:57:32<2:30:40, 86.92s/it]
2025-01-27 08:55:56 - ERROR - stderr -  78%|███████████████████████████████████████████████████████████████▉                  | 365/468 [8:58:58<2:28:52, 86.73s/it]
2025-01-27 08:55:56 - ERROR - stderr - 
2025-01-27 08:55:56 - ERROR - stderr - 
2025-01-27 08:55:56 - INFO - stdout - {'loss': 0.495, 'learning_rate': 3.6668534344402287e-06, 'epoch': 0.78}
2025-01-27 08:55:56 - ERROR - stderr -  78%|███████████████████████████████████████████████████████████████▉                  | 365/468 [8:58:58<2:28:52, 86.73s/it]
2025-01-27 08:57:24 - ERROR - stderr -  78%|████████████████████████████████████████████████████████████████▏                 | 366/468 [9:00:26<2:27:57, 87.03s/it]
2025-01-27 08:57:24 - ERROR - stderr - 
2025-01-27 08:57:24 - ERROR - stderr - 
2025-01-27 08:57:24 - INFO - stdout - {'loss': 0.5018, 'learning_rate': 3.598979031875127e-06, 'epoch': 0.78}
2025-01-27 08:57:24 - ERROR - stderr -  78%|████████████████████████████████████████████████████████████████▏                 | 366/468 [9:00:26<2:27:57, 87.03s/it]
2025-01-27 08:58:51 - ERROR - stderr -  78%|████████████████████████████████████████████████████████████████▎                 | 367/468 [9:01:53<2:26:51, 87.24s/it]
2025-01-27 08:58:51 - ERROR - stderr - 
2025-01-27 08:58:51 - ERROR - stderr - 
2025-01-27 08:58:51 - INFO - stdout - {'loss': 0.4753, 'learning_rate': 3.5316529635240997e-06, 'epoch': 0.78}
2025-01-27 08:58:51 - ERROR - stderr -  78%|████████████████████████████████████████████████████████████████▎                 | 367/468 [9:01:53<2:26:51, 87.24s/it]
2025-01-27 09:00:19 - ERROR - stderr -  79%|████████████████████████████████████████████████████████████████▍                 | 368/468 [9:03:21<2:25:29, 87.30s/it]
2025-01-27 09:00:19 - ERROR - stderr - 
2025-01-27 09:00:19 - ERROR - stderr - 
2025-01-27 09:00:19 - INFO - stdout - {'loss': 0.5023, 'learning_rate': 3.4648784674468687e-06, 'epoch': 0.79}
2025-01-27 09:00:19 - ERROR - stderr -  79%|████████████████████████████████████████████████████████████████▍                 | 368/468 [9:03:21<2:25:29, 87.30s/it]
2025-01-27 09:01:47 - ERROR - stderr -  79%|████████████████████████████████████████████████████████████████▋                 | 369/468 [9:04:49<2:24:20, 87.48s/it]
2025-01-27 09:01:47 - ERROR - stderr - 
2025-01-27 09:01:47 - ERROR - stderr - 
2025-01-27 09:01:47 - INFO - stdout - {'loss': 0.4863, 'learning_rate': 3.398658755175183e-06, 'epoch': 0.79}
2025-01-27 09:01:47 - ERROR - stderr -  79%|████████████████████████████████████████████████████████████████▋                 | 369/468 [9:04:49<2:24:20, 87.48s/it]
2025-01-27 09:03:14 - ERROR - stderr -  79%|████████████████████████████████████████████████████████████████▊                 | 370/468 [9:06:16<2:22:43, 87.38s/it]
2025-01-27 09:03:14 - ERROR - stderr - 
2025-01-27 09:03:14 - ERROR - stderr - 
2025-01-27 09:03:14 - INFO - stdout - {'loss': 0.4678, 'learning_rate': 3.3329970115583637e-06, 'epoch': 0.79}
2025-01-27 09:03:14 - ERROR - stderr -  79%|████████████████████████████████████████████████████████████████▊                 | 370/468 [9:06:16<2:22:43, 87.38s/it]
2025-01-27 09:04:41 - ERROR - stderr -  79%|█████████████████████████████████████████████████████████████████                 | 371/468 [9:07:43<2:21:15, 87.37s/it]
2025-01-27 09:04:41 - ERROR - stderr - 
2025-01-27 09:04:41 - ERROR - stderr - 
2025-01-27 09:04:41 - INFO - stdout - {'loss': 0.4765, 'learning_rate': 3.2678963946101275e-06, 'epoch': 0.79}
2025-01-27 09:04:41 - ERROR - stderr -  79%|█████████████████████████████████████████████████████████████████                 | 371/468 [9:07:43<2:21:15, 87.37s/it]
2025-01-27 09:06:08 - ERROR - stderr -  79%|█████████████████████████████████████████████████████████████████▏                | 372/468 [9:09:10<2:19:33, 87.23s/it]
2025-01-27 09:06:08 - ERROR - stderr - 
2025-01-27 09:06:08 - ERROR - stderr - 
2025-01-27 09:06:08 - INFO - stdout - {'loss': 0.4708, 'learning_rate': 3.203360035356695e-06, 'epoch': 0.79}
2025-01-27 09:06:08 - ERROR - stderr -  79%|█████████████████████████████████████████████████████████████████▏                | 372/468 [9:09:10<2:19:33, 87.23s/it]
2025-01-27 09:07:35 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▎                | 373/468 [9:10:37<2:18:05, 87.22s/it]
2025-01-27 09:07:35 - ERROR - stderr - 
2025-01-27 09:07:35 - ERROR - stderr - 
2025-01-27 09:07:35 - INFO - stdout - {'loss': 0.4732, 'learning_rate': 3.139391037686214e-06, 'epoch': 0.8}
2025-01-27 09:07:35 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▎                | 373/468 [9:10:37<2:18:05, 87.22s/it]
2025-01-27 09:09:04 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▌                | 374/468 [9:12:06<2:17:09, 87.55s/it]
2025-01-27 09:09:04 - ERROR - stderr - 
2025-01-27 09:09:04 - ERROR - stderr - 
2025-01-27 09:09:04 - INFO - stdout - {'loss': 0.4845, 'learning_rate': 3.0759924781994702e-06, 'epoch': 0.8}
2025-01-27 09:09:04 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▌                | 374/468 [9:12:06<2:17:09, 87.55s/it]
2025-01-27 09:10:31 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▋                | 375/468 [9:13:33<2:15:51, 87.65s/it]
2025-01-27 09:10:31 - ERROR - stderr - 
2025-01-27 09:10:31 - ERROR - stderr - 
2025-01-27 09:10:31 - INFO - stdout - {'loss': 0.4746, 'learning_rate': 3.013167406061916e-06, 'epoch': 0.8}
2025-01-27 09:10:31 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▋                | 375/468 [9:13:33<2:15:51, 87.65s/it]
2025-01-27 09:11:59 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▉                | 376/468 [9:15:01<2:14:21, 87.63s/it]
2025-01-27 09:11:59 - ERROR - stderr - 
2025-01-27 09:11:59 - ERROR - stderr - 
2025-01-27 09:11:59 - INFO - stdout - {'loss': 0.458, 'learning_rate': 2.9509188428570287e-06, 'epoch': 0.8}
2025-01-27 09:11:59 - ERROR - stderr -  80%|█████████████████████████████████████████████████████████████████▉                | 376/468 [9:15:01<2:14:21, 87.63s/it]
2025-01-27 09:13:25 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████                | 377/468 [9:16:27<2:12:15, 87.20s/it]
2025-01-27 09:13:25 - ERROR - stderr - 
2025-01-27 09:13:25 - ERROR - stderr - 
2025-01-27 09:13:25 - INFO - stdout - {'loss': 0.4893, 'learning_rate': 2.889249782440979e-06, 'epoch': 0.8}
2025-01-27 09:13:25 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████                | 377/468 [9:16:27<2:12:15, 87.20s/it]
2025-01-27 09:14:52 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▏               | 378/468 [9:17:54<2:10:32, 87.03s/it]
2025-01-27 09:14:52 - ERROR - stderr - 
2025-01-27 09:14:52 - ERROR - stderr - 
2025-01-27 09:14:52 - INFO - stdout - {'loss': 0.4691, 'learning_rate': 2.8281631907986445e-06, 'epoch': 0.81}
2025-01-27 09:14:52 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▏               | 378/468 [9:17:54<2:10:32, 87.03s/it]
2025-01-27 09:16:20 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▍               | 379/468 [9:19:22<2:09:35, 87.37s/it]
2025-01-27 09:16:20 - ERROR - stderr - 
2025-01-27 09:16:20 - ERROR - stderr - 
2025-01-27 09:16:20 - INFO - stdout - {'loss': 0.4837, 'learning_rate': 2.767662005900958e-06, 'epoch': 0.81}
2025-01-27 09:16:20 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▍               | 379/468 [9:19:22<2:09:35, 87.37s/it]
2025-01-27 09:17:47 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▌               | 380/468 [9:20:49<2:07:47, 87.13s/it]
2025-01-27 09:17:47 - ERROR - stderr - 
2025-01-27 09:17:47 - ERROR - stderr - 
2025-01-27 09:17:47 - INFO - stdout - {'loss': 0.4632, 'learning_rate': 2.7077491375636074e-06, 'epoch': 0.81}
2025-01-27 09:17:47 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▌               | 380/468 [9:20:49<2:07:47, 87.13s/it]
2025-01-27 09:19:13 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▊               | 381/468 [9:22:15<2:06:02, 86.93s/it]
2025-01-27 09:19:13 - ERROR - stderr - 
2025-01-27 09:19:13 - ERROR - stderr - 
2025-01-27 09:19:13 - INFO - stdout - {'loss': 0.4821, 'learning_rate': 2.6484274673070864e-06, 'epoch': 0.81}
2025-01-27 09:19:13 - ERROR - stderr -  81%|██████████████████████████████████████████████████████████████████▊               | 381/468 [9:22:15<2:06:02, 86.93s/it]
2025-01-27 09:20:41 - ERROR - stderr -  82%|██████████████████████████████████████████████████████████████████▉               | 382/468 [9:23:43<2:04:57, 87.18s/it]
2025-01-27 09:20:41 - ERROR - stderr - 
2025-01-27 09:20:41 - ERROR - stderr - 
2025-01-27 09:20:41 - INFO - stdout - {'loss': 0.4624, 'learning_rate': 2.5896998482181104e-06, 'epoch': 0.81}
2025-01-27 09:20:41 - ERROR - stderr -  82%|██████████████████████████████████████████████████████████████████▉               | 382/468 [9:23:43<2:04:57, 87.18s/it]
2025-01-27 09:22:08 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████               | 383/468 [9:25:10<2:03:23, 87.10s/it]
2025-01-27 09:22:08 - ERROR - stderr - 
2025-01-27 09:22:08 - ERROR - stderr - 
2025-01-27 09:22:08 - INFO - stdout - {'loss': 0.4631, 'learning_rate': 2.5315691048123886e-06, 'epoch': 0.82}
2025-01-27 09:22:08 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████               | 383/468 [9:25:10<2:03:23, 87.10s/it]
2025-01-27 09:23:36 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████▎              | 384/468 [9:26:38<2:02:16, 87.33s/it]
2025-01-27 09:23:36 - ERROR - stderr - 
2025-01-27 09:23:36 - ERROR - stderr - 
2025-01-27 09:23:36 - INFO - stdout - {'loss': 0.4778, 'learning_rate': 2.474038032898786e-06, 'epoch': 0.82}
2025-01-27 09:23:36 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████▎              | 384/468 [9:26:38<2:02:16, 87.33s/it]
2025-01-27 09:25:03 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████▍              | 385/468 [9:28:05<2:00:52, 87.38s/it]
2025-01-27 09:25:03 - ERROR - stderr - 
2025-01-27 09:25:03 - ERROR - stderr - 
2025-01-27 09:25:03 - INFO - stdout - {'loss': 0.4773, 'learning_rate': 2.4171093994448634e-06, 'epoch': 0.82}
2025-01-27 09:25:03 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████▍              | 385/468 [9:28:05<2:00:52, 87.38s/it]
2025-01-27 09:26:30 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████▋              | 386/468 [9:29:32<1:59:14, 87.25s/it]
2025-01-27 09:26:30 - ERROR - stderr - 
2025-01-27 09:26:30 - ERROR - stderr - 
2025-01-27 09:26:30 - INFO - stdout - {'loss': 0.4731, 'learning_rate': 2.3607859424437745e-06, 'epoch': 0.82}
2025-01-27 09:26:30 - ERROR - stderr -  82%|███████████████████████████████████████████████████████████████████▋              | 386/468 [9:29:32<1:59:14, 87.25s/it]
2025-01-27 09:27:57 - ERROR - stderr -  83%|███████████████████████████████████████████████████████████████████▊              | 387/468 [9:30:59<1:57:44, 87.22s/it]
2025-01-27 09:27:57 - ERROR - stderr - 
2025-01-27 09:27:57 - ERROR - stderr - 
2025-01-27 09:27:57 - INFO - stdout - {'loss': 0.494, 'learning_rate': 2.3050703707826187e-06, 'epoch': 0.83}
2025-01-27 09:27:57 - ERROR - stderr -  83%|███████████████████████████████████████████████████████████████████▊              | 387/468 [9:30:59<1:57:44, 87.22s/it]
2025-01-27 09:29:24 - ERROR - stderr -  83%|███████████████████████████████████████████████████████████████████▉              | 388/468 [9:32:26<1:56:11, 87.14s/it]
2025-01-27 09:29:24 - ERROR - stderr - 
2025-01-27 09:29:24 - ERROR - stderr - 
2025-01-27 09:29:24 - INFO - stdout - {'loss': 0.4681, 'learning_rate': 2.2499653641121275e-06, 'epoch': 0.83}
2025-01-27 09:29:24 - ERROR - stderr -  83%|███████████████████████████████████████████████████████████████████▉              | 388/468 [9:32:26<1:56:11, 87.14s/it]
2025-01-27 09:30:52 - ERROR - stderr -  83%|████████████████████████████████████████████████████████████████████▏             | 389/468 [9:33:54<1:54:50, 87.22s/it]
2025-01-27 09:30:52 - ERROR - stderr - 
2025-01-27 09:30:52 - ERROR - stderr - 
2025-01-27 09:30:52 - INFO - stdout - {'loss': 0.4674, 'learning_rate': 2.195473572717797e-06, 'epoch': 0.83}
2025-01-27 09:30:52 - ERROR - stderr -  83%|████████████████████████████████████████████████████████████████████▏             | 389/468 [9:33:54<1:54:50, 87.22s/it]
2025-01-27 09:32:19 - ERROR - stderr -  83%|████████████████████████████████████████████████████████████████████▎             | 390/468 [9:35:21<1:53:37, 87.40s/it]
2025-01-27 09:32:19 - ERROR - stderr - 
2025-01-27 09:32:19 - ERROR - stderr - 
2025-01-27 09:32:19 - INFO - stdout - {'loss': 0.4534, 'learning_rate': 2.141597617392425e-06, 'epoch': 0.83}
2025-01-27 09:32:19 - ERROR - stderr -  83%|████████████████████████████████████████████████████████████████████▎             | 390/468 [9:35:21<1:53:37, 87.40s/it]
2025-01-27 09:33:47 - ERROR - stderr -  84%|████████████████████████████████████████████████████████████████████▌             | 391/468 [9:36:49<1:52:22, 87.57s/it]
2025-01-27 09:33:47 - ERROR - stderr - 
2025-01-27 09:33:47 - ERROR - stderr - 
2025-01-27 09:33:47 - INFO - stdout - {'loss': 0.4915, 'learning_rate': 2.0883400893100535e-06, 'epoch': 0.83}
2025-01-27 09:33:47 - ERROR - stderr -  84%|████████████████████████████████████████████████████████████████████▌             | 391/468 [9:36:49<1:52:22, 87.57s/it]
2025-01-27 09:35:14 - ERROR - stderr -  84%|████████████████████████████████████████████████████████████████████▋             | 392/468 [9:38:16<1:50:35, 87.31s/it]
2025-01-27 09:35:14 - ERROR - stderr - 
2025-01-27 09:35:14 - ERROR - stderr - 
2025-01-27 09:35:14 - INFO - stdout - {'loss': 0.4764, 'learning_rate': 2.0357035499013548e-06, 'epoch': 0.84}
2025-01-27 09:35:14 - ERROR - stderr -  84%|████████████████████████████████████████████████████████████████████▋             | 392/468 [9:38:16<1:50:35, 87.31s/it]
2025-01-27 09:36:42 - ERROR - stderr -  84%|████████████████████████████████████████████████████████████████████▊             | 393/468 [9:39:44<1:49:17, 87.43s/it]
2025-01-27 09:36:42 - ERROR - stderr - 
2025-01-27 09:36:42 - ERROR - stderr - 
2025-01-27 09:36:42 - INFO - stdout - {'loss': 0.4859, 'learning_rate': 1.983690530730439e-06, 'epoch': 0.84}
2025-01-27 09:36:42 - ERROR - stderr -  84%|████████████████████████████████████████████████████████████████████▊             | 393/468 [9:39:44<1:49:17, 87.43s/it]
2025-01-27 09:38:09 - ERROR - stderr -  84%|█████████████████████████████████████████████████████████████████████             | 394/468 [9:41:11<1:47:44, 87.36s/it]
2025-01-27 09:38:09 - ERROR - stderr - 
2025-01-27 09:38:09 - ERROR - stderr - 
2025-01-27 09:38:09 - INFO - stdout - {'loss': 0.4738, 'learning_rate': 1.9323035333730898e-06, 'epoch': 0.84}
2025-01-27 09:38:09 - ERROR - stderr -  84%|█████████████████████████████████████████████████████████████████████             | 394/468 [9:41:11<1:47:44, 87.36s/it]
2025-01-27 09:39:36 - ERROR - stderr -  84%|█████████████████████████████████████████████████████████████████████▏            | 395/468 [9:42:38<1:46:03, 87.17s/it]
2025-01-27 09:39:36 - ERROR - stderr - 
2025-01-27 09:39:36 - ERROR - stderr - 
2025-01-27 09:39:36 - INFO - stdout - {'loss': 0.4834, 'learning_rate': 1.8815450292964625e-06, 'epoch': 0.84}
2025-01-27 09:39:36 - ERROR - stderr -  84%|█████████████████████████████████████████████████████████████████████▏            | 395/468 [9:42:38<1:46:03, 87.17s/it]
2025-01-27 09:41:03 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▍            | 396/468 [9:44:05<1:44:36, 87.17s/it]
2025-01-27 09:41:03 - ERROR - stderr - 
2025-01-27 09:41:03 - ERROR - stderr - 
2025-01-27 09:41:03 - INFO - stdout - {'loss': 0.4655, 'learning_rate': 1.8314174597401995e-06, 'epoch': 0.84}
2025-01-27 09:41:03 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▍            | 396/468 [9:44:05<1:44:36, 87.17s/it]
2025-01-27 09:42:30 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▌            | 397/468 [9:45:32<1:43:12, 87.22s/it]
2025-01-27 09:42:30 - ERROR - stderr - 
2025-01-27 09:42:30 - ERROR - stderr - 
2025-01-27 09:42:30 - INFO - stdout - {'loss': 0.473, 'learning_rate': 1.7819232355990428e-06, 'epoch': 0.85}
2025-01-27 09:42:30 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▌            | 397/468 [9:45:32<1:43:12, 87.22s/it]
2025-01-27 09:43:58 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▋            | 398/468 [9:47:00<1:41:57, 87.39s/it]
2025-01-27 09:43:58 - ERROR - stderr - 
2025-01-27 09:43:58 - ERROR - stderr - 
2025-01-27 09:43:58 - INFO - stdout - {'loss': 0.4844, 'learning_rate': 1.7330647373068642e-06, 'epoch': 0.85}
2025-01-27 09:43:58 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▋            | 398/468 [9:47:00<1:41:57, 87.39s/it]
2025-01-27 09:45:26 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▉            | 399/468 [9:48:28<1:40:43, 87.59s/it]
2025-01-27 09:45:26 - ERROR - stderr - 
2025-01-27 09:45:26 - ERROR - stderr - 
2025-01-27 09:45:26 - INFO - stdout - {'loss': 0.4715, 'learning_rate': 1.6848443147221832e-06, 'epoch': 0.85}
2025-01-27 09:45:26 - ERROR - stderr -  85%|█████████████████████████████████████████████████████████████████████▉            | 399/468 [9:48:28<1:40:43, 87.59s/it]
2025-01-27 09:46:54 - ERROR - stderr -  85%|██████████████████████████████████████████████████████████████████████            | 400/468 [9:49:56<1:39:31, 87.81s/it]
2025-01-27 09:46:54 - ERROR - stderr - 
2025-01-27 09:46:54 - ERROR - stderr - 
2025-01-27 09:46:54 - INFO - stdout - {'loss': 0.4625, 'learning_rate': 1.6372642870151538e-06, 'epoch': 0.85}
2025-01-27 09:46:54 - ERROR - stderr -  85%|██████████████████████████████████████████████████████████████████████            | 400/468 [9:49:56<1:39:31, 87.81s/it]
2025-01-27 09:48:23 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▎           | 401/468 [9:51:25<1:38:11, 87.93s/it]
2025-01-27 09:48:23 - ERROR - stderr - 
2025-01-27 09:48:23 - ERROR - stderr - 
2025-01-27 09:48:23 - INFO - stdout - {'loss': 0.4679, 'learning_rate': 1.5903269425560146e-06, 'epoch': 0.86}
2025-01-27 09:48:23 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▎           | 401/468 [9:51:25<1:38:11, 87.93s/it]
2025-01-27 09:49:50 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▍           | 402/468 [9:52:52<1:36:32, 87.77s/it]
2025-01-27 09:49:50 - ERROR - stderr - 
2025-01-27 09:49:50 - ERROR - stderr - 
2025-01-27 09:49:50 - INFO - stdout - {'loss': 0.4685, 'learning_rate': 1.5440345388050393e-06, 'epoch': 0.86}
2025-01-27 09:49:50 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▍           | 402/468 [9:52:52<1:36:32, 87.77s/it]
2025-01-27 09:51:18 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▌           | 403/468 [9:54:20<1:35:09, 87.84s/it]
2025-01-27 09:51:18 - ERROR - stderr - 
2025-01-27 09:51:18 - ERROR - stderr - 
2025-01-27 09:51:18 - INFO - stdout - {'loss': 0.4678, 'learning_rate': 1.4983893022039518e-06, 'epoch': 0.86}
2025-01-27 09:51:18 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▌           | 403/468 [9:54:20<1:35:09, 87.84s/it]
2025-01-27 09:52:46 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▊           | 404/468 [9:55:48<1:33:39, 87.81s/it]
2025-01-27 09:52:46 - ERROR - stderr - 
2025-01-27 09:52:46 - ERROR - stderr - 
2025-01-27 09:52:46 - INFO - stdout - {'loss': 0.4801, 'learning_rate': 1.4533934280688632e-06, 'epoch': 0.86}
2025-01-27 09:52:46 - ERROR - stderr -  86%|██████████████████████████████████████████████████████████████████████▊           | 404/468 [9:55:48<1:33:39, 87.81s/it]
2025-01-27 09:54:13 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▉           | 405/468 [9:57:15<1:32:01, 87.64s/it]
2025-01-27 09:54:13 - ERROR - stderr - 
2025-01-27 09:54:13 - ERROR - stderr - 
2025-01-27 09:54:13 - INFO - stdout - {'loss': 0.4823, 'learning_rate': 1.4090490804846612e-06, 'epoch': 0.86}
2025-01-27 09:54:13 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▉           | 405/468 [9:57:15<1:32:01, 87.64s/it]
2025-01-27 09:55:41 - ERROR - stderr -  87%|███████████████████████████████████████████████████████████████████████▏          | 406/468 [9:58:43<1:30:34, 87.65s/it]
2025-01-27 09:55:41 - ERROR - stderr - 
2025-01-27 09:55:41 - ERROR - stderr - 
2025-01-27 09:55:41 - INFO - stdout - {'loss': 0.4874, 'learning_rate': 1.3653583922009576e-06, 'epoch': 0.87}
2025-01-27 09:55:41 - ERROR - stderr -  87%|███████████████████████████████████████████████████████████████████████▏          | 406/468 [9:58:43<1:30:34, 87.65s/it]
2025-01-27 09:57:07 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▍          | 407/468 [10:00:09<1:28:50, 87.39s/it]
2025-01-27 09:57:07 - ERROR - stderr - 
2025-01-27 09:57:07 - ERROR - stderr - 
2025-01-27 09:57:07 - INFO - stdout - {'loss': 0.4695, 'learning_rate': 1.3223234645294907e-06, 'epoch': 0.87}
2025-01-27 09:57:07 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▍          | 407/468 [10:00:09<1:28:50, 87.39s/it]
2025-01-27 09:58:35 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▌          | 408/468 [10:01:37<1:27:21, 87.36s/it]
2025-01-27 09:58:35 - ERROR - stderr - 
2025-01-27 09:58:35 - ERROR - stderr - 
2025-01-27 09:58:35 - INFO - stdout - {'loss': 0.4806, 'learning_rate': 1.2799463672430766e-06, 'epoch': 0.87}
2025-01-27 09:58:35 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▌          | 408/468 [10:01:37<1:27:21, 87.36s/it]
2025-01-27 10:00:02 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▊          | 409/468 [10:03:04<1:26:00, 87.46s/it]
2025-01-27 10:00:02 - ERROR - stderr - 
2025-01-27 10:00:02 - ERROR - stderr - 
2025-01-27 10:00:02 - INFO - stdout - {'loss': 0.4865, 'learning_rate': 1.2382291384760536e-06, 'epoch': 0.87}
2025-01-27 10:00:02 - ERROR - stderr -  87%|██████████████████████████████████████████████████████████████████████▊          | 409/468 [10:03:04<1:26:00, 87.46s/it]
2025-01-27 10:01:30 - ERROR - stderr -  88%|██████████████████████████████████████████████████████████████████████▉          | 410/468 [10:04:32<1:24:40, 87.59s/it]
2025-01-27 10:01:30 - ERROR - stderr - 
2025-01-27 10:01:30 - ERROR - stderr - 
2025-01-27 10:01:30 - INFO - stdout - {'loss': 0.4874, 'learning_rate': 1.197173784626261e-06, 'epoch': 0.87}
2025-01-27 10:01:30 - ERROR - stderr -  88%|██████████████████████████████████████████████████████████████████████▉          | 410/468 [10:04:32<1:24:40, 87.59s/it]
2025-01-27 10:02:57 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▏         | 411/468 [10:05:59<1:23:02, 87.41s/it]
2025-01-27 10:02:57 - ERROR - stderr - 
2025-01-27 10:02:57 - ERROR - stderr - 
2025-01-27 10:02:57 - INFO - stdout - {'loss': 0.4961, 'learning_rate': 1.1567822802585436e-06, 'epoch': 0.88}
2025-01-27 10:02:57 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▏         | 411/468 [10:05:59<1:23:02, 87.41s/it]
2025-01-27 10:04:25 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▎         | 412/468 [10:07:27<1:21:36, 87.43s/it]
2025-01-27 10:04:25 - ERROR - stderr - 
2025-01-27 10:04:25 - ERROR - stderr - 
2025-01-27 10:04:25 - INFO - stdout - {'loss': 0.4695, 'learning_rate': 1.11705656800978e-06, 'epoch': 0.88}
2025-01-27 10:04:25 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▎         | 412/468 [10:07:27<1:21:36, 87.43s/it]
2025-01-27 10:05:52 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▍         | 413/468 [10:08:54<1:20:04, 87.36s/it]
2025-01-27 10:05:52 - ERROR - stderr - 
2025-01-27 10:05:52 - ERROR - stderr - 
2025-01-27 10:05:52 - INFO - stdout - {'loss': 0.4672, 'learning_rate': 1.0779985584954556e-06, 'epoch': 0.88}
2025-01-27 10:05:52 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▍         | 413/468 [10:08:54<1:20:04, 87.36s/it]
2025-01-27 10:07:20 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▋         | 414/468 [10:10:22<1:18:47, 87.55s/it]
2025-01-27 10:07:20 - ERROR - stderr - 
2025-01-27 10:07:20 - ERROR - stderr - 
2025-01-27 10:07:20 - INFO - stdout - {'loss': 0.4753, 'learning_rate': 1.0396101302177725e-06, 'epoch': 0.88}
2025-01-27 10:07:20 - ERROR - stderr -  88%|███████████████████████████████████████████████████████████████████████▋         | 414/468 [10:10:22<1:18:47, 87.55s/it]
2025-01-27 10:08:48 - ERROR - stderr -  89%|███████████████████████████████████████████████████████████████████████▊         | 415/468 [10:11:50<1:17:24, 87.63s/it]
2025-01-27 10:08:48 - ERROR - stderr - 
2025-01-27 10:08:48 - ERROR - stderr - 
2025-01-27 10:08:48 - INFO - stdout - {'loss': 0.4681, 'learning_rate': 1.0018931294752897e-06, 'epoch': 0.89}
2025-01-27 10:08:48 - ERROR - stderr -  89%|███████████████████████████████████████████████████████████████████████▊         | 415/468 [10:11:50<1:17:24, 87.63s/it]
2025-01-27 10:10:15 - ERROR - stderr -  89%|████████████████████████████████████████████████████████████████████████         | 416/468 [10:13:17<1:15:50, 87.51s/it]
2025-01-27 10:10:15 - ERROR - stderr - 
2025-01-27 10:10:15 - ERROR - stderr - 
2025-01-27 10:10:15 - INFO - stdout - {'loss': 0.4865, 'learning_rate': 9.648493702741429e-07, 'epoch': 0.89}
2025-01-27 10:10:15 - ERROR - stderr -  89%|████████████████████████████████████████████████████████████████████████         | 416/468 [10:13:17<1:15:50, 87.51s/it]
2025-01-27 10:11:42 - ERROR - stderr -  89%|████████████████████████████████████████████████████████████████████████▏        | 417/468 [10:14:44<1:14:21, 87.47s/it]
2025-01-27 10:11:42 - ERROR - stderr - 
2025-01-27 10:11:42 - ERROR - stderr - 
2025-01-27 10:11:42 - INFO - stdout - {'loss': 0.4697, 'learning_rate': 9.284806342407881e-07, 'epoch': 0.89}
2025-01-27 10:11:42 - ERROR - stderr -  89%|████████████████████████████████████████████████████████████████████████▏        | 417/468 [10:14:44<1:14:21, 87.47s/it]
2025-01-27 10:13:10 - ERROR - stderr -  89%|████████████████████████████████████████████████████████████████████████▎        | 418/468 [10:16:12<1:12:50, 87.41s/it]
2025-01-27 10:13:10 - ERROR - stderr - 
2025-01-27 10:13:10 - ERROR - stderr - 
2025-01-27 10:13:10 - INFO - stdout - {'loss': 0.488, 'learning_rate': 8.927886705363181e-07, 'epoch': 0.89}
2025-01-27 10:13:10 - ERROR - stderr -  89%|████████████████████████████████████████████████████████████████████████▎        | 418/468 [10:16:12<1:12:50, 87.41s/it]
2025-01-27 10:14:36 - ERROR - stderr -  90%|████████████████████████████████████████████████████████████████████████▌        | 419/468 [10:17:38<1:11:14, 87.23s/it]
2025-01-27 10:14:37 - ERROR - stderr - 
2025-01-27 10:14:37 - ERROR - stderr - 
2025-01-27 10:14:37 - INFO - stdout - {'loss': 0.4819, 'learning_rate': 8.577751957723339e-07, 'epoch': 0.89}
2025-01-27 10:14:37 - ERROR - stderr -  90%|████████████████████████████████████████████████████████████████████████▌        | 419/468 [10:17:39<1:11:14, 87.23s/it]
2025-01-27 10:16:03 - ERROR - stderr -  90%|████████████████████████████████████████████████████████████████████████▋        | 420/468 [10:19:05<1:09:42, 87.14s/it]
2025-01-27 10:16:03 - ERROR - stderr - 
2025-01-27 10:16:03 - ERROR - stderr - 
2025-01-27 10:16:03 - INFO - stdout - {'loss': 0.4722, 'learning_rate': 8.234418939283866e-07, 'epoch': 0.9}
2025-01-27 10:16:03 - ERROR - stderr -  90%|████████████████████████████████████████████████████████████████████████▋        | 420/468 [10:19:06<1:09:42, 87.14s/it]
2025-01-27 10:17:31 - ERROR - stderr -  90%|████████████████████████████████████████████████████████████████████████▊        | 421/468 [10:20:33<1:08:25, 87.34s/it]
2025-01-27 10:17:31 - ERROR - stderr - 
2025-01-27 10:17:31 - ERROR - stderr - 
2025-01-27 10:17:31 - INFO - stdout - {'loss': 0.4689, 'learning_rate': 7.897904162709846e-07, 'epoch': 0.9}
2025-01-27 10:17:31 - ERROR - stderr -  90%|████████████████████████████████████████████████████████████████████████▊        | 421/468 [10:20:33<1:08:25, 87.34s/it]
2025-01-27 10:19:00 - ERROR - stderr -  90%|█████████████████████████████████████████████████████████████████████████        | 422/468 [10:22:02<1:07:12, 87.66s/it]
2025-01-27 10:19:00 - ERROR - stderr - 
2025-01-27 10:19:00 - ERROR - stderr - 
2025-01-27 10:19:00 - INFO - stdout - {'loss': 0.4722, 'learning_rate': 7.568223812741743e-07, 'epoch': 0.9}
2025-01-27 10:19:00 - ERROR - stderr -  90%|█████████████████████████████████████████████████████████████████████████        | 422/468 [10:22:02<1:07:12, 87.66s/it]
2025-01-27 10:20:27 - ERROR - stderr -  90%|█████████████████████████████████████████████████████████████████████████▏       | 423/468 [10:23:29<1:05:40, 87.56s/it]
2025-01-27 10:20:27 - ERROR - stderr - 
2025-01-27 10:20:27 - ERROR - stderr - 
2025-01-27 10:20:27 - INFO - stdout - {'loss': 0.4823, 'learning_rate': 7.245393745417039e-07, 'epoch': 0.9}
2025-01-27 10:20:27 - ERROR - stderr -  90%|█████████████████████████████████████████████████████████████████████████▏       | 423/468 [10:23:29<1:05:40, 87.56s/it]
2025-01-27 10:21:54 - ERROR - stderr -  91%|█████████████████████████████████████████████████████████████████████████▍       | 424/468 [10:24:56<1:04:06, 87.41s/it]
2025-01-27 10:21:54 - ERROR - stderr - 
2025-01-27 10:21:54 - ERROR - stderr - 
2025-01-27 10:21:54 - INFO - stdout - {'loss': 0.4659, 'learning_rate': 6.929429487307581e-07, 'epoch': 0.9}
2025-01-27 10:21:54 - ERROR - stderr -  91%|█████████████████████████████████████████████████████████████████████████▍       | 424/468 [10:24:56<1:04:06, 87.41s/it]
2025-01-27 10:23:21 - ERROR - stderr -  91%|█████████████████████████████████████████████████████████████████████████▌       | 425/468 [10:26:23<1:02:36, 87.36s/it]
2025-01-27 10:23:21 - ERROR - stderr - 
2025-01-27 10:23:21 - ERROR - stderr - 
2025-01-27 10:23:21 - INFO - stdout - {'loss': 0.4677, 'learning_rate': 6.620346234772839e-07, 'epoch': 0.91}
2025-01-27 10:23:21 - ERROR - stderr -  91%|█████████████████████████████████████████████████████████████████████████▌       | 425/468 [10:26:23<1:02:36, 87.36s/it]
2025-01-27 10:24:49 - ERROR - stderr -  91%|█████████████████████████████████████████████████████████████████████████▋       | 426/468 [10:27:51<1:01:12, 87.44s/it]
2025-01-27 10:24:49 - ERROR - stderr - 
2025-01-27 10:24:49 - ERROR - stderr - 
2025-01-27 10:24:49 - INFO - stdout - {'loss': 0.4634, 'learning_rate': 6.318158853229095e-07, 'epoch': 0.91}
2025-01-27 10:24:49 - ERROR - stderr -  91%|█████████████████████████████████████████████████████████████████████████▋       | 426/468 [10:27:51<1:01:12, 87.44s/it]
2025-01-27 10:26:17 - ERROR - stderr -  91%|███████████████████████████████████████████████████████████████████████████▋       | 427/468 [10:29:19<59:53, 87.64s/it]
2025-01-27 10:26:17 - ERROR - stderr - 
2025-01-27 10:26:17 - ERROR - stderr - 
2025-01-27 10:26:17 - INFO - stdout - {'loss': 0.4893, 'learning_rate': 6.022881876434377e-07, 'epoch': 0.91}
2025-01-27 10:26:17 - ERROR - stderr -  91%|███████████████████████████████████████████████████████████████████████████▋       | 427/468 [10:29:19<59:53, 87.64s/it]
2025-01-27 10:27:44 - ERROR - stderr -  91%|███████████████████████████████████████████████████████████████████████████▉       | 428/468 [10:30:46<58:15, 87.39s/it]
2025-01-27 10:27:44 - ERROR - stderr - 
2025-01-27 10:27:44 - ERROR - stderr - 
2025-01-27 10:27:44 - INFO - stdout - {'loss': 0.4806, 'learning_rate': 5.73452950578956e-07, 'epoch': 0.91}
2025-01-27 10:27:44 - ERROR - stderr -  91%|███████████████████████████████████████████████████████████████████████████▉       | 428/468 [10:30:46<58:15, 87.39s/it]
2025-01-27 10:29:11 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████       | 429/468 [10:32:13<56:47, 87.38s/it]
2025-01-27 10:29:11 - ERROR - stderr - 
2025-01-27 10:29:11 - ERROR - stderr - 
2025-01-27 10:29:11 - INFO - stdout - {'loss': 0.4919, 'learning_rate': 5.453115609655285e-07, 'epoch': 0.92}
2025-01-27 10:29:11 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████       | 429/468 [10:32:13<56:47, 87.38s/it]
2025-01-27 10:30:39 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████▎      | 430/468 [10:33:41<55:30, 87.63s/it]
2025-01-27 10:30:39 - ERROR - stderr - 
2025-01-27 10:30:39 - ERROR - stderr - 
2025-01-27 10:30:39 - INFO - stdout - {'loss': 0.4676, 'learning_rate': 5.178653722684984e-07, 'epoch': 0.92}
2025-01-27 10:30:39 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████▎      | 430/468 [10:33:41<55:30, 87.63s/it]
2025-01-27 10:32:07 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████▍      | 431/468 [10:35:09<54:02, 87.64s/it]
2025-01-27 10:32:07 - ERROR - stderr - 
2025-01-27 10:32:07 - ERROR - stderr - 
2025-01-27 10:32:07 - INFO - stdout - {'loss': 0.4819, 'learning_rate': 4.911157045173931e-07, 'epoch': 0.92}
2025-01-27 10:32:07 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████▍      | 431/468 [10:35:09<54:02, 87.64s/it]
2025-01-27 10:33:35 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████▌      | 432/468 [10:36:37<52:37, 87.70s/it]
2025-01-27 10:33:35 - ERROR - stderr - 
2025-01-27 10:33:35 - ERROR - stderr - 
2025-01-27 10:33:35 - INFO - stdout - {'loss': 0.479, 'learning_rate': 4.650638442424349e-07, 'epoch': 0.92}
2025-01-27 10:33:35 - ERROR - stderr -  92%|████████████████████████████████████████████████████████████████████████████▌      | 432/468 [10:36:37<52:37, 87.70s/it]
2025-01-27 10:35:02 - ERROR - stderr -  93%|████████████████████████████████████████████████████████████████████████████▊      | 433/468 [10:38:04<51:04, 87.55s/it]
2025-01-27 10:35:02 - ERROR - stderr - 
2025-01-27 10:35:02 - ERROR - stderr - 
2025-01-27 10:35:02 - INFO - stdout - {'loss': 0.4836, 'learning_rate': 4.3971104441266565e-07, 'epoch': 0.92}
2025-01-27 10:35:02 - ERROR - stderr -  93%|████████████████████████████████████████████████████████████████████████████▊      | 433/468 [10:38:04<51:04, 87.55s/it]
2025-01-27 10:36:29 - ERROR - stderr -  93%|████████████████████████████████████████████████████████████████████████████▉      | 434/468 [10:39:31<49:35, 87.52s/it]
2025-01-27 10:36:30 - ERROR - stderr - 
2025-01-27 10:36:30 - ERROR - stderr - 
2025-01-27 10:36:30 - INFO - stdout - {'loss': 0.4667, 'learning_rate': 4.1505852437568823e-07, 'epoch': 0.93}
2025-01-27 10:36:30 - ERROR - stderr -  93%|████████████████████████████████████████████████████████████████████████████▉      | 434/468 [10:39:32<49:35, 87.52s/it]
2025-01-27 10:37:57 - ERROR - stderr -  93%|█████████████████████████████████████████████████████████████████████████████▏     | 435/468 [10:40:59<48:12, 87.65s/it]
2025-01-27 10:37:57 - ERROR - stderr - 
2025-01-27 10:37:57 - ERROR - stderr - 
2025-01-27 10:37:57 - INFO - stdout - {'loss': 0.4864, 'learning_rate': 3.911074697990136e-07, 'epoch': 0.93}
2025-01-27 10:37:57 - ERROR - stderr -  93%|█████████████████████████████████████████████████████████████████████████████▏     | 435/468 [10:40:59<48:12, 87.65s/it]
2025-01-27 10:39:24 - ERROR - stderr -  93%|█████████████████████████████████████████████████████████████████████████████▎     | 436/468 [10:42:26<46:37, 87.42s/it]
2025-01-27 10:39:24 - ERROR - stderr - 
2025-01-27 10:39:24 - ERROR - stderr - 
2025-01-27 10:39:24 - INFO - stdout - {'loss': 0.4743, 'learning_rate': 3.678590326130493e-07, 'epoch': 0.93}
2025-01-27 10:39:24 - ERROR - stderr -  93%|█████████████████████████████████████████████████████████████████████████████▎     | 436/468 [10:42:26<46:37, 87.42s/it]
2025-01-27 10:40:53 - ERROR - stderr -  93%|█████████████████████████████████████████████████████████████████████████████▌     | 437/468 [10:43:55<45:17, 87.68s/it]
2025-01-27 10:40:53 - ERROR - stderr - 
2025-01-27 10:40:53 - ERROR - stderr - 
2025-01-27 10:40:53 - INFO - stdout - {'loss': 0.4719, 'learning_rate': 3.4531433095568445e-07, 'epoch': 0.93}
2025-01-27 10:40:53 - ERROR - stderr -  93%|█████████████████████████████████████████████████████████████████████████████▌     | 437/468 [10:43:55<45:17, 87.68s/it]
2025-01-27 10:42:21 - ERROR - stderr -  94%|█████████████████████████████████████████████████████████████████████████████▋     | 438/468 [10:45:23<43:57, 87.93s/it]
2025-01-27 10:42:21 - ERROR - stderr - 
2025-01-27 10:42:21 - ERROR - stderr - 
2025-01-27 10:42:21 - INFO - stdout - {'loss': 0.4767, 'learning_rate': 3.234744491185204e-07, 'epoch': 0.93}
2025-01-27 10:42:21 - ERROR - stderr -  94%|█████████████████████████████████████████████████████████████████████████████▋     | 438/468 [10:45:23<43:57, 87.93s/it]
2025-01-27 10:43:49 - ERROR - stderr -  94%|█████████████████████████████████████████████████████████████████████████████▊     | 439/468 [10:46:51<42:27, 87.86s/it]
2025-01-27 10:43:49 - ERROR - stderr - 
2025-01-27 10:43:49 - ERROR - stderr - 
2025-01-27 10:43:49 - INFO - stdout - {'loss': 0.4714, 'learning_rate': 3.023404374947147e-07, 'epoch': 0.94}
2025-01-27 10:43:49 - ERROR - stderr -  94%|█████████████████████████████████████████████████████████████████████████████▊     | 439/468 [10:46:51<42:27, 87.86s/it]
2025-01-27 10:45:17 - ERROR - stderr -  94%|██████████████████████████████████████████████████████████████████████████████     | 440/468 [10:48:19<41:00, 87.87s/it]
2025-01-27 10:45:17 - ERROR - stderr - 
2025-01-27 10:45:17 - ERROR - stderr - 
2025-01-27 10:45:17 - INFO - stdout - {'loss': 0.4781, 'learning_rate': 2.8191331252847117e-07, 'epoch': 0.94}
2025-01-27 10:45:17 - ERROR - stderr -  94%|██████████████████████████████████████████████████████████████████████████████     | 440/468 [10:48:19<41:00, 87.87s/it]
2025-01-27 10:46:45 - ERROR - stderr -  94%|██████████████████████████████████████████████████████████████████████████████▏    | 441/468 [10:49:47<39:33, 87.90s/it]
2025-01-27 10:46:45 - ERROR - stderr - 
2025-01-27 10:46:45 - ERROR - stderr - 
2025-01-27 10:46:45 - INFO - stdout - {'loss': 0.4685, 'learning_rate': 2.62194056666144e-07, 'epoch': 0.94}
2025-01-27 10:46:45 - ERROR - stderr -  94%|██████████████████████████████████████████████████████████████████████████████▏    | 441/468 [10:49:47<39:33, 87.90s/it]
2025-01-27 10:48:12 - ERROR - stderr -  94%|██████████████████████████████████████████████████████████████████████████████▍    | 442/468 [10:51:14<38:03, 87.84s/it]
2025-01-27 10:48:12 - ERROR - stderr - 
2025-01-27 10:48:12 - ERROR - stderr - 
2025-01-27 10:48:12 - INFO - stdout - {'loss': 0.4696, 'learning_rate': 2.431836183089975e-07, 'epoch': 0.94}
2025-01-27 10:48:12 - ERROR - stderr -  94%|██████████████████████████████████████████████████████████████████████████████▍    | 442/468 [10:51:14<38:03, 87.84s/it]
2025-01-27 10:49:40 - ERROR - stderr -  95%|██████████████████████████████████████████████████████████████████████████████▌    | 443/468 [10:52:42<36:36, 87.85s/it]
2025-01-27 10:49:40 - ERROR - stderr - 
2025-01-27 10:49:40 - ERROR - stderr - 
2025-01-27 10:49:40 - INFO - stdout - {'loss': 0.4513, 'learning_rate': 2.2488291176758047e-07, 'epoch': 0.95}
2025-01-27 10:49:40 - ERROR - stderr -  95%|██████████████████████████████████████████████████████████████████████████████▌    | 443/468 [10:52:42<36:36, 87.85s/it]
2025-01-27 10:51:08 - ERROR - stderr -  95%|██████████████████████████████████████████████████████████████████████████████▋    | 444/468 [10:54:10<35:07, 87.79s/it]
2025-01-27 10:51:08 - ERROR - stderr - 
2025-01-27 10:51:08 - ERROR - stderr - 
2025-01-27 10:51:08 - INFO - stdout - {'loss': 0.4784, 'learning_rate': 2.0729281721776682e-07, 'epoch': 0.95}
2025-01-27 10:51:08 - ERROR - stderr -  95%|██████████████████████████████████████████████████████████████████████████████▋    | 444/468 [10:54:10<35:07, 87.79s/it]
2025-01-27 10:52:36 - ERROR - stderr -  95%|██████████████████████████████████████████████████████████████████████████████▉    | 445/468 [10:55:38<33:39, 87.82s/it]
2025-01-27 10:52:36 - ERROR - stderr - 
2025-01-27 10:52:36 - ERROR - stderr - 
2025-01-27 10:52:36 - INFO - stdout - {'loss': 0.4916, 'learning_rate': 1.904141806584042e-07, 'epoch': 0.95}
2025-01-27 10:52:36 - ERROR - stderr -  95%|██████████████████████████████████████████████████████████████████████████████▉    | 445/468 [10:55:38<33:39, 87.82s/it]
2025-01-27 10:54:03 - ERROR - stderr -  95%|███████████████████████████████████████████████████████████████████████████████    | 446/468 [10:57:05<32:11, 87.79s/it]
2025-01-27 10:54:03 - ERROR - stderr - 
2025-01-27 10:54:03 - ERROR - stderr - 
2025-01-27 10:54:03 - INFO - stdout - {'loss': 0.4572, 'learning_rate': 1.7424781387064658e-07, 'epoch': 0.95}
2025-01-27 10:54:03 - ERROR - stderr -  95%|███████████████████████████████████████████████████████████████████████████████    | 446/468 [10:57:05<32:11, 87.79s/it]
2025-01-27 10:55:31 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▎   | 447/468 [10:58:33<30:42, 87.72s/it]
2025-01-27 10:55:31 - ERROR - stderr - 
2025-01-27 10:55:31 - ERROR - stderr - 
2025-01-27 10:55:31 - INFO - stdout - {'loss': 0.4676, 'learning_rate': 1.5879449437889547e-07, 'epoch': 0.95}
2025-01-27 10:55:31 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▎   | 447/468 [10:58:33<30:42, 87.72s/it]
2025-01-27 10:56:59 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▍   | 448/468 [11:00:01<29:13, 87.69s/it]
2025-01-27 10:56:59 - ERROR - stderr - 
2025-01-27 10:56:59 - ERROR - stderr - 
2025-01-27 10:56:59 - INFO - stdout - {'loss': 0.4801, 'learning_rate': 1.440549654134149e-07, 'epoch': 0.96}
2025-01-27 10:56:59 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▍   | 448/468 [11:00:01<29:13, 87.69s/it]
2025-01-27 10:58:26 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▋   | 449/468 [11:01:28<27:44, 87.61s/it]
2025-01-27 10:58:26 - ERROR - stderr - 
2025-01-27 10:58:26 - ERROR - stderr - 
2025-01-27 10:58:26 - INFO - stdout - {'loss': 0.4615, 'learning_rate': 1.3002993587457656e-07, 'epoch': 0.96}
2025-01-27 10:58:26 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▋   | 449/468 [11:01:28<27:44, 87.61s/it]
2025-01-27 10:59:53 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▊   | 450/468 [11:02:55<26:11, 87.32s/it]
2025-01-27 10:59:53 - ERROR - stderr - 
2025-01-27 10:59:53 - ERROR - stderr - 
2025-01-27 10:59:53 - INFO - stdout - {'loss': 0.4805, 'learning_rate': 1.167200802987739e-07, 'epoch': 0.96}
2025-01-27 10:59:53 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▊   | 450/468 [11:02:55<26:11, 87.32s/it]
2025-01-27 11:01:21 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▉   | 451/468 [11:04:23<24:48, 87.54s/it]
2025-01-27 11:01:21 - ERROR - stderr - 
2025-01-27 11:01:21 - ERROR - stderr - 
2025-01-27 11:01:21 - INFO - stdout - {'loss': 0.4831, 'learning_rate': 1.041260388259746e-07, 'epoch': 0.96}
2025-01-27 11:01:21 - ERROR - stderr -  96%|███████████████████████████████████████████████████████████████████████████████▉   | 451/468 [11:04:23<24:48, 87.54s/it]
2025-01-27 11:02:48 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▏  | 452/468 [11:05:50<23:21, 87.58s/it]
2025-01-27 11:02:49 - ERROR - stderr - 
2025-01-27 11:02:49 - ERROR - stderr - 
2025-01-27 11:02:49 - INFO - stdout - {'loss': 0.4841, 'learning_rate': 9.224841716893362e-08, 'epoch': 0.96}
2025-01-27 11:02:49 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▏  | 452/468 [11:05:51<23:21, 87.58s/it]
2025-01-27 11:04:16 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▎  | 453/468 [11:07:18<21:52, 87.48s/it]
2025-01-27 11:04:16 - ERROR - stderr - 
2025-01-27 11:04:16 - ERROR - stderr - 
2025-01-27 11:04:16 - INFO - stdout - {'loss': 0.468, 'learning_rate': 8.108778658406646e-08, 'epoch': 0.97}
2025-01-27 11:04:16 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▎  | 453/468 [11:07:18<21:52, 87.48s/it]
2025-01-27 11:05:44 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▌  | 454/468 [11:08:46<20:26, 87.62s/it]
2025-01-27 11:05:44 - ERROR - stderr - 
2025-01-27 11:05:44 - ERROR - stderr - 
2025-01-27 11:05:44 - INFO - stdout - {'loss': 0.4702, 'learning_rate': 7.06446838439645e-08, 'epoch': 0.97}
2025-01-27 11:05:44 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▌  | 454/468 [11:08:46<20:26, 87.62s/it]
2025-01-27 11:07:11 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▋  | 455/468 [11:10:13<18:57, 87.49s/it]
2025-01-27 11:07:11 - ERROR - stderr - 
2025-01-27 11:07:11 - ERROR - stderr - 
2025-01-27 11:07:11 - INFO - stdout - {'loss': 0.4761, 'learning_rate': 6.091961121158896e-08, 'epoch': 0.97}
2025-01-27 11:07:11 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▋  | 455/468 [11:10:13<18:57, 87.49s/it]
2025-01-27 11:08:38 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▊  | 456/468 [11:11:40<17:28, 87.39s/it]
2025-01-27 11:08:38 - ERROR - stderr - 
2025-01-27 11:08:38 - ERROR - stderr - 
2025-01-27 11:08:38 - INFO - stdout - {'loss': 0.4567, 'learning_rate': 5.1913036416106895e-08, 'epoch': 0.97}
2025-01-27 11:08:38 - ERROR - stderr -  97%|████████████████████████████████████████████████████████████████████████████████▊  | 456/468 [11:11:40<17:28, 87.39s/it]
2025-01-27 11:10:06 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████  | 457/468 [11:13:08<16:01, 87.45s/it]
2025-01-27 11:10:06 - ERROR - stderr - 
2025-01-27 11:10:06 - ERROR - stderr - 
2025-01-27 11:10:06 - INFO - stdout - {'loss': 0.4556, 'learning_rate': 4.3625392630400884e-08, 'epoch': 0.97}
2025-01-27 11:10:06 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████  | 457/468 [11:13:08<16:01, 87.45s/it]
2025-01-27 11:11:33 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████▏ | 458/468 [11:14:35<14:33, 87.37s/it]
2025-01-27 11:11:33 - ERROR - stderr - 
2025-01-27 11:11:33 - ERROR - stderr - 
2025-01-27 11:11:33 - INFO - stdout - {'loss': 0.4701, 'learning_rate': 3.605707845023232e-08, 'epoch': 0.98}
2025-01-27 11:11:33 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████▏ | 458/468 [11:14:35<14:33, 87.37s/it]
2025-01-27 11:13:00 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████▍ | 459/468 [11:16:02<13:04, 87.22s/it]
2025-01-27 11:13:00 - ERROR - stderr - 
2025-01-27 11:13:00 - ERROR - stderr - 
2025-01-27 11:13:00 - INFO - stdout - {'loss': 0.47, 'learning_rate': 2.920845787507509e-08, 'epoch': 0.98}
2025-01-27 11:13:00 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████▍ | 459/468 [11:16:02<13:04, 87.22s/it]
2025-01-27 11:14:27 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████▌ | 460/468 [11:17:29<11:37, 87.17s/it]
2025-01-27 11:14:27 - ERROR - stderr - 
2025-01-27 11:14:27 - ERROR - stderr - 
2025-01-27 11:14:27 - INFO - stdout - {'loss': 0.4785, 'learning_rate': 2.3079860290601272e-08, 'epoch': 0.98}
2025-01-27 11:14:27 - ERROR - stderr -  98%|█████████████████████████████████████████████████████████████████████████████████▌ | 460/468 [11:17:29<11:37, 87.17s/it]
2025-01-27 11:15:54 - ERROR - stderr -  99%|█████████████████████████████████████████████████████████████████████████████████▊ | 461/468 [11:18:56<10:10, 87.19s/it]
2025-01-27 11:15:54 - ERROR - stderr - 
2025-01-27 11:15:54 - ERROR - stderr - 
2025-01-27 11:15:54 - INFO - stdout - {'loss': 0.4803, 'learning_rate': 1.7671580452848777e-08, 'epoch': 0.98}
2025-01-27 11:15:54 - ERROR - stderr -  99%|█████████████████████████████████████████████████████████████████████████████████▊ | 461/468 [11:18:56<10:10, 87.19s/it]
2025-01-27 11:17:22 - ERROR - stderr -  99%|█████████████████████████████████████████████████████████████████████████████████▉ | 462/468 [11:20:24<08:44, 87.37s/it]
2025-01-27 11:17:22 - ERROR - stderr - 
2025-01-27 11:17:22 - ERROR - stderr - 
2025-01-27 11:17:22 - INFO - stdout - {'loss': 0.4768, 'learning_rate': 1.2983878474034372e-08, 'epoch': 0.99}
2025-01-27 11:17:22 - ERROR - stderr -  99%|█████████████████████████████████████████████████████████████████████████████████▉ | 462/468 [11:20:24<08:44, 87.37s/it]
2025-01-27 11:18:49 - ERROR - stderr -  99%|██████████████████████████████████████████████████████████████████████████████████ | 463/468 [11:21:51<07:17, 87.42s/it]
2025-01-27 11:18:49 - ERROR - stderr - 
2025-01-27 11:18:49 - ERROR - stderr - 
2025-01-27 11:18:49 - INFO - stdout - {'loss': 0.4613, 'learning_rate': 9.016979810055337e-09, 'epoch': 0.99}
2025-01-27 11:18:49 - ERROR - stderr -  99%|██████████████████████████████████████████████████████████████████████████████████ | 463/468 [11:21:51<07:17, 87.42s/it]
2025-01-27 11:20:17 - ERROR - stderr -  99%|██████████████████████████████████████████████████████████████████████████████████▎| 464/468 [11:23:19<05:49, 87.37s/it]
2025-01-27 11:20:17 - ERROR - stderr - 
2025-01-27 11:20:17 - ERROR - stderr - 
2025-01-27 11:20:17 - INFO - stdout - {'loss': 0.4798, 'learning_rate': 5.771075249634827e-09, 'epoch': 0.99}
2025-01-27 11:20:17 - ERROR - stderr -  99%|██████████████████████████████████████████████████████████████████████████████████▎| 464/468 [11:23:19<05:49, 87.37s/it]
2025-01-27 11:21:44 - ERROR - stderr -  99%|██████████████████████████████████████████████████████████████████████████████████▍| 465/468 [11:24:46<04:22, 87.36s/it]
2025-01-27 11:21:44 - ERROR - stderr - 
2025-01-27 11:21:44 - ERROR - stderr - 
2025-01-27 11:21:44 - INFO - stdout - {'loss': 0.4797, 'learning_rate': 3.246320905155864e-09, 'epoch': 0.99}
2025-01-27 11:21:44 - ERROR - stderr -  99%|██████████████████████████████████████████████████████████████████████████████████▍| 465/468 [11:24:46<04:22, 87.36s/it]
2025-01-27 11:23:12 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████▋| 466/468 [11:26:14<02:55, 87.59s/it]
2025-01-27 11:23:12 - ERROR - stderr - 
2025-01-27 11:23:12 - ERROR - stderr - 
2025-01-27 11:23:12 - INFO - stdout - {'loss': 0.4639, 'learning_rate': 1.44283820514568e-09, 'epoch': 0.99}
2025-01-27 11:23:12 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████▋| 466/468 [11:26:14<02:55, 87.59s/it]
2025-01-27 11:24:39 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████▊| 467/468 [11:27:41<01:27, 87.51s/it]
2025-01-27 11:24:39 - ERROR - stderr - 
2025-01-27 11:24:39 - ERROR - stderr - 
2025-01-27 11:24:39 - INFO - stdout - {'loss': 0.4692, 'learning_rate': 3.6071388843705867e-10, 'epoch': 1.0}
2025-01-27 11:24:39 - ERROR - stderr - 100%|██████████████████████████████████████████████████████████████████████████████████▊| 467/468 [11:27:41<01:27, 87.51s/it]
2025-01-27 11:26:08 - ERROR - stderr - 100%|███████████████████████████████████████████████████████████████████████████████████| 468/468 [11:29:10<00:00, 87.72s/it]
2025-01-27 11:26:08 - ERROR - stderr - 
2025-01-27 11:26:08 - ERROR - stderr - 
2025-01-27 11:26:08 - INFO - stdout - {'loss': 0.4776, 'learning_rate': 0.0, 'epoch': 1.0}
2025-01-27 11:26:08 - ERROR - stderr - 100%|███████████████████████████████████████████████████████████████████████████████████| 468/468 [11:29:10<00:00, 87.72s/it]
2025-01-27 11:26:08 - INFO - transformers.trainer - 

Training completed. Do not forget to share your model on huggingface.co/models =)


2025-01-27 11:26:08 - INFO - transformers.trainer - 

Training completed. Do not forget to share your model on huggingface.co/models =)


2025-01-27 11:26:08 - ERROR - stderr - 
2025-01-27 11:26:08 - ERROR - stderr - 
2025-01-27 11:26:08 - INFO - stdout - {'train_runtime': 41353.3131, 'train_samples_per_second': 1.451, 'train_steps_per_second': 0.011, 'train_loss': 0.5158002531299224, 'epoch': 1.0}
2025-01-27 11:26:08 - ERROR - stderr - 100%|███████████████████████████████████████████████████████████████████████████████████| 468/468 [11:29:10<00:00, 87.72s/it]
2025-01-27 11:26:08 - ERROR - stderr - 100%|███████████████████████████████████████████████████████████████████████████████████| 468/468 [11:29:10<00:00, 88.35s/it]
2025-01-27 11:26:08 - ERROR - stderr - 
2025-01-27 11:26:15 - INFO - transformers.trainer - Saving model checkpoint to outputs/PointLLM_train_stage2/test_stage2
2025-01-27 11:26:15 - INFO - transformers.trainer - Saving model checkpoint to outputs/PointLLM_train_stage2/test_stage2
2025-01-27 11:26:15 - INFO - transformers.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/config.json
2025-01-27 11:26:15 - INFO - transformers.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/config.json
2025-01-27 11:26:15 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/generation_config.json
2025-01-27 11:26:15 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/PointLLM_train_stage2/test_stage2/generation_config.json
2025-01-27 11:26:36 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/PointLLM_train_stage2/test_stage2/pytorch_model.bin.index.json.
2025-01-27 11:26:36 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/PointLLM_train_stage2/test_stage2/pytorch_model.bin.index.json.
2025-01-27 11:26:36 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/PointLLM_train_stage2/test_stage2/tokenizer_config.json
2025-01-27 11:26:36 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/PointLLM_train_stage2/test_stage2/tokenizer_config.json
2025-01-27 11:26:36 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/PointLLM_train_stage2/test_stage2/special_tokens_map.json
2025-01-27 11:26:36 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/PointLLM_train_stage2/test_stage2/special_tokens_map.json
2025-01-27 11:26:36 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/PointLLM_train_stage2/test_stage2/added_tokens.json
2025-01-27 11:26:36 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/PointLLM_train_stage2/test_stage2/added_tokens.json
2025-01-27 11:31:15 - WARNING - wandb - message_loop has been closed