2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Using PointBERT. 2025-01-26 23:45:55 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml. 2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Using 6 dim of points. 2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513. 2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Point backbone output dim: 384. 2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers. 2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units. 2025-01-26 23:45:55 - INFO - pointllm.model.pointllm - Point projector output dim: 4096. 2025-01-26 23:45:56 - ERROR - stderr - Loading checkpoint shards: 0%| | 0/3 [00:00} 2025-01-26 23:46:22 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:46:22 - ERROR - stderr - warnings.warn( 2025-01-26 23:46:22 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00, 7.63s/it] 2025-01-26 23:46:22 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00, 7.65s/it] 2025-01-26 23:46:22 - ERROR - stderr - 2025-01-26 23:46:22 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False 2025-01-26 23:46:22 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded. 2025-01-26 23:46:22 - INFO - pointllm.train.train - Point projection layer is trainable. 2025-01-26 23:46:22 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json. 2025-01-26 23:46:22 - INFO - stdout - Using conversation_type: ['simple_description'] 2025-01-26 23:46:22 - INFO - stdout - Before filtering, the dataset size is: 6. 2025-01-26 23:46:22 - INFO - stdout - After filtering, the dataset size is: 6. 2025-01-26 23:46:22 - INFO - stdout - Number of simple_description: 6 2025-01-26 23:46:22 - INFO - transformers.trainer - Using cuda_amp half precision backend 2025-01-26 23:46:22 - INFO - transformers.trainer - Using cuda_amp half precision backend 2025-01-26 23:46:22 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-01-26 23:46:22 - ERROR - stderr - {} 2025-01-26 23:46:22 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:46:22 - ERROR - stderr - warnings.warn( 2025-01-26 23:46:25 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. 2025-01-26 23:46:25 - ERROR - stderr - else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype) 2025-01-26 23:46:28 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:46:28 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:46:28 - INFO - transformers.trainer - Num examples = 6 2025-01-26 23:46:28 - INFO - transformers.trainer - Num examples = 6 2025-01-26 23:46:28 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:46:28 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:46:28 - INFO - transformers.trainer - Instantaneous batch size per device = 1 2025-01-26 23:46:28 - INFO - transformers.trainer - Instantaneous batch size per device = 1 2025-01-26 23:46:28 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 128 2025-01-26 23:46:28 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 128 2025-01-26 23:46:28 - INFO - transformers.trainer - Gradient Accumulation steps = 64 2025-01-26 23:46:28 - INFO - transformers.trainer - Gradient Accumulation steps = 64 2025-01-26 23:46:28 - INFO - transformers.trainer - Total optimization steps = 1 2025-01-26 23:46:28 - INFO - transformers.trainer - Total optimization steps = 1 2025-01-26 23:46:28 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:46:28 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:46:28 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:46:28 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:46:33 - ERROR - stderr - wandb: W&B API key is configured. Use `wandb login --relogin` to force relogin 2025-01-26 23:46:33 - INFO - wandb - Current SDK version is 0.19.4 2025-01-26 23:46:33 - INFO - wandb - Configure stats pid to 42181 2025-01-26 23:46:33 - INFO - wandb - Loading settings from /root/.config/wandb/settings 2025-01-26 23:46:33 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings 2025-01-26 23:46:33 - INFO - wandb - Loading settings from environment variables 2025-01-26 23:46:33 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_234633-qw4mkw5m/logs/debug.log 2025-01-26 23:46:33 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_234633-qw4mkw5m/logs/debug-internal.log 2025-01-26 23:46:33 - INFO - wandb - calling init triggers 2025-01-26 23:46:33 - INFO - wandb - wandb.init called with sweep_config: {} config: {} 2025-01-26 23:46:33 - INFO - wandb - starting backend 2025-01-26 23:46:33 - ERROR - stderr - wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. 2025-01-26 23:46:33 - INFO - wandb - sending inform_init request 2025-01-26 23:46:33 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings: 2025-01-26 23:46:33 - ERROR - stderr - Expected `list[str]` but got `tuple` - serialized value may not be as expected 2025-01-26 23:46:33 - ERROR - stderr - return self.__pydantic_serializer__.to_python( 2025-01-26 23:46:33 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn 2025-01-26 23:46:33 - INFO - wandb - backend started and connected 2025-01-26 23:46:33 - DEBUG - wandb - no default config file found in config-defaults.yaml 2025-01-26 23:46:33 - INFO - wandb - updated telemetry 2025-01-26 23:46:33 - INFO - wandb - communicating run to backend with 90.0 second timeout 2025-01-26 23:46:34 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:46:35 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:46:36 - ERROR - stderr - wandb: | Waiting for wandb.init()... 2025-01-26 23:46:37 - ERROR - stderr - wandb: / Waiting for wandb.init()... 2025-01-26 23:46:38 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:46:39 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:46:40 - ERROR - stderr - wandb: | Waiting for wandb.init()... 2025-01-26 23:46:41 - ERROR - stderr - wandb: / Waiting for wandb.init()... 2025-01-26 23:46:42 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:46:43 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:46:44 - ERROR - stderr - wandb: | Waiting for wandb.init()... 2025-01-26 23:46:45 - ERROR - stderr - wandb: / Waiting for wandb.init()... 2025-01-26 23:46:46 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:46:47 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:46:48 - ERROR - stderr - wandb: | Waiting for wandb.init()... 2025-01-26 23:46:49 - WARNING - wandb - interrupted Traceback (most recent call last): File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1444, in init return wi.init(run_settings, run_config) File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 920, in init result = run_init_handle.wait( File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 279, in wait found, abandoned = self._slot._get_and_clear(timeout=wait_timeout) File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 126, in _get_and_clear if self._wait(timeout=timeout): File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 122, in _wait return self._event.wait(timeout=timeout) File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 607, in wait signaled = self._cond.wait(timeout) File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 324, in wait gotit = waiter.acquire(True, timeout) KeyboardInterrupt 2025-01-26 23:46:49 - ERROR - stderr - Traceback (most recent call last): 2025-01-26 23:46:49 - ERROR - stderr - File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in 2025-01-26 23:46:49 - ERROR - stderr - train() 2025-01-26 23:46:49 - ERROR - stderr - File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train 2025-01-26 23:46:49 - ERROR - stderr - trainer.train() 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train 2025-01-26 23:46:49 - ERROR - stderr - return inner_training_loop( 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1827, in _inner_training_loop 2025-01-26 23:46:49 - ERROR - stderr - self.control = self.callback_handler.on_train_begin(args, self.state, self.control) 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin 2025-01-26 23:46:49 - ERROR - stderr - return self.call_event("on_train_begin", args, state, control) 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event 2025-01-26 23:46:49 - ERROR - stderr - result = getattr(callback, event)( 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 753, in on_train_begin 2025-01-26 23:46:49 - ERROR - stderr - self.setup(args, state, model, **kwargs) 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 727, in setup 2025-01-26 23:46:49 - ERROR - stderr - self._wandb.init( 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1444, in init 2025-01-26 23:46:49 - ERROR - stderr - return wi.init(run_settings, run_config) 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 920, in init 2025-01-26 23:46:49 - ERROR - stderr - result = run_init_handle.wait( 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 279, in wait 2025-01-26 23:46:49 - ERROR - stderr - found, abandoned = self._slot._get_and_clear(timeout=wait_timeout) 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 126, in _get_and_clear 2025-01-26 23:46:49 - ERROR - stderr - if self._wait(timeout=timeout): 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 122, in _wait 2025-01-26 23:46:49 - ERROR - stderr - return self._event.wait(timeout=timeout) 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 607, in wait 2025-01-26 23:46:49 - ERROR - stderr - signaled = self._cond.wait(timeout) 2025-01-26 23:46:49 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 324, in wait 2025-01-26 23:46:49 - ERROR - stderr - gotit = waiter.acquire(True, timeout) 2025-01-26 23:46:49 - ERROR - stderr - KeyboardInterrupt 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: Traceback (most recent call last): 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: train() 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: trainer.train() 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: return inner_training_loop( 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1827, in _inner_training_loop 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: self.control = self.callback_handler.on_train_begin(args, self.state, self.control) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: return self.call_event("on_train_begin", args, state, control) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: result = getattr(callback, event)( 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 753, in on_train_begin 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: self.setup(args, state, model, **kwargs) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/integrations.py", line 727, in setup 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: self._wandb.init( 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1444, in init 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: return wi.init(run_settings, run_config) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 920, in init 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: result = run_init_handle.wait( 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 279, in wait 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: found, abandoned = self._slot._get_and_clear(timeout=wait_timeout) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 126, in _get_and_clear 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: if self._wait(timeout=timeout): 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/wandb/sdk/lib/mailbox.py", line 122, in _wait 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: return self._event.wait(timeout=timeout) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 607, in wait 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: signaled = self._cond.wait(timeout) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/threading.py", line 324, in wait 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: gotit = waiter.acquire(True, timeout) 2025-01-26 23:46:49 - ERROR - stderr - [rank0]: KeyboardInterrupt 2025-01-26 23:46:49 - WARNING - wandb - message_loop has been closed 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using PointBERT. 2025-01-26 23:48:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using 6 dim of points. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point backbone output dim: 384. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using PointBERT. 2025-01-26 23:48:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point projector output dim: 4096. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Using 6 dim of points. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point backbone output dim: 384. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units. 2025-01-26 23:48:51 - INFO - pointllm.model.pointllm - Point projector output dim: 4096. 2025-01-26 23:48:52 - ERROR - stderr - Loading checkpoint shards: 0%| | 0/3 [00:00} 2025-01-26 23:49:12 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:49:12 - ERROR - stderr - warnings.warn( 2025-01-26 23:49:12 - ERROR - stderr - Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:20<00:00, 6.57s/it] 2025-01-26 23:49:12 - ERROR - stderr - Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 3/3 [00:20<00:00, 6.90s/it] 2025-01-26 23:49:12 - ERROR - stderr - 2025-01-26 23:49:12 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False 2025-01-26 23:49:13 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded. 2025-01-26 23:49:13 - INFO - pointllm.train.train - Point projection layer is trainable. 2025-01-26 23:49:13 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json. 2025-01-26 23:49:13 - INFO - stdout - Using conversation_type: ['simple_description'] 2025-01-26 23:49:13 - INFO - stdout - Before filtering, the dataset size is: 6. 2025-01-26 23:49:13 - INFO - stdout - After filtering, the dataset size is: 6. 2025-01-26 23:49:13 - INFO - stdout - Number of simple_description: 6 2025-01-26 23:49:13 - INFO - transformers.trainer - Using cuda_amp half precision backend 2025-01-26 23:49:13 - INFO - transformers.trainer - Using cuda_amp half precision backend 2025-01-26 23:49:13 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-01-26 23:49:13 - ERROR - stderr - {} 2025-01-26 23:49:13 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:49:13 - ERROR - stderr - warnings.warn( 2025-01-26 23:49:19 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. 2025-01-26 23:49:19 - ERROR - stderr - else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype) 2025-01-26 23:49:20 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:49:20 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:49:20 - INFO - transformers.trainer - Num examples = 6 2025-01-26 23:49:20 - INFO - transformers.trainer - Num examples = 6 2025-01-26 23:49:20 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:49:20 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:49:20 - INFO - transformers.trainer - Instantaneous batch size per device = 1 2025-01-26 23:49:20 - INFO - transformers.trainer - Instantaneous batch size per device = 1 2025-01-26 23:49:20 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 128 2025-01-26 23:49:20 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 128 2025-01-26 23:49:20 - INFO - transformers.trainer - Gradient Accumulation steps = 64 2025-01-26 23:49:20 - INFO - transformers.trainer - Gradient Accumulation steps = 64 2025-01-26 23:49:20 - INFO - transformers.trainer - Total optimization steps = 1 2025-01-26 23:49:20 - INFO - transformers.trainer - Total optimization steps = 1 2025-01-26 23:49:20 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:49:20 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:49:20 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:49:20 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:49:22 - ERROR - stderr - wandb: Currently logged in as: 1282467298 (1282467298-university-of-nottingham-ningbo-china). Use `wandb login --relogin` to force relogin 2025-01-26 23:49:22 - INFO - wandb - Current SDK version is 0.19.4 2025-01-26 23:49:22 - INFO - wandb - Configure stats pid to 44018 2025-01-26 23:49:22 - INFO - wandb - Loading settings from /root/.config/wandb/settings 2025-01-26 23:49:22 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings 2025-01-26 23:49:22 - INFO - wandb - Loading settings from environment variables 2025-01-26 23:49:22 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_234922-i2eep5vt/logs/debug.log 2025-01-26 23:49:22 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_234922-i2eep5vt/logs/debug-internal.log 2025-01-26 23:49:22 - INFO - wandb - calling init triggers 2025-01-26 23:49:22 - INFO - wandb - wandb.init called with sweep_config: {} config: {} 2025-01-26 23:49:22 - INFO - wandb - starting backend 2025-01-26 23:49:22 - ERROR - stderr - wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. 2025-01-26 23:49:22 - INFO - wandb - sending inform_init request 2025-01-26 23:49:22 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings: 2025-01-26 23:49:22 - ERROR - stderr - Expected `list[str]` but got `tuple` - serialized value may not be as expected 2025-01-26 23:49:22 - ERROR - stderr - return self.__pydantic_serializer__.to_python( 2025-01-26 23:49:22 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn 2025-01-26 23:49:22 - INFO - wandb - backend started and connected 2025-01-26 23:49:22 - DEBUG - wandb - no default config file found in config-defaults.yaml 2025-01-26 23:49:22 - INFO - wandb - updated telemetry 2025-01-26 23:49:22 - INFO - wandb - communicating run to backend with 90.0 second timeout 2025-01-26 23:49:23 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:49:24 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:49:25 - ERROR - stderr - wandb: | Waiting for wandb.init()... 2025-01-26 23:49:26 - ERROR - stderr - wandb: / Waiting for wandb.init()... 2025-01-26 23:49:27 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:49:28 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:49:29 - ERROR - stderr - wandb: | Waiting for wandb.init()... 2025-01-26 23:49:30 - ERROR - stderr - wandb: / Waiting for wandb.init()... 2025-01-26 23:49:31 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:49:32 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:49:33 - ERROR - stderr - wandb: | Waiting for wandb.init()... 2025-01-26 23:49:34 - ERROR - stderr - wandb: / Waiting for wandb.init()... 2025-01-26 23:49:35 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:49:35 - INFO - wandb - starting run threads in backend 2025-01-26 23:49:35 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings: 2025-01-26 23:49:35 - ERROR - stderr - Expected `list[str]` but got `tuple` - serialized value may not be as expected 2025-01-26 23:49:35 - ERROR - stderr - return self.__pydantic_serializer__.to_python( 2025-01-26 23:49:35 - ERROR - stderr - wandb: Tracking run with wandb version 0.19.4 2025-01-26 23:49:35 - ERROR - stderr - wandb: Run data is saved locally in /code/syr/PointLLM/wandb/run-20250126_234922-i2eep5vt 2025-01-26 23:49:35 - ERROR - stderr - wandb: Run `wandb offline` to turn off syncing. 2025-01-26 23:49:35 - ERROR - stderr - wandb: Syncing run PointLLM_train_stage2 2025-01-26 23:49:35 - ERROR - stderr - wandb: ⭐️ View project at https://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface 2025-01-26 23:49:35 - ERROR - stderr - wandb: 🚀 View run at https://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface/runs/i2eep5vt 2025-01-26 23:49:35 - DEBUG - wandb - Saving list of pip packages installed into the current environment 2025-01-26 23:49:35 - INFO - wandb - atexit reg 2025-01-26 23:49:35 - INFO - wandb - redirect: wrap_raw 2025-01-26 23:49:35 - INFO - wandb - Wrapping output streams. 2025-01-26 23:49:35 - INFO - wandb - Redirects installed. 2025-01-26 23:49:35 - INFO - wandb - run started, returning control to user process 2025-01-26 23:49:35 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'chunk_size_feed_forward': 0, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['PointLLMLlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2', 'transformers_version': '4.28.0.dev0', 'DEFAULT_POINT_END_TOKEN': '', 'DEFAULT_POINT_PATCH_TOKEN': '', 'DEFAULT_POINT_START_TOKEN': '', 'max_position_embeddings': 2048, 'mm_use_point_start_end': True, 'model_type': 'pointllm', 'point_backbone': 'PointBERT', 'point_backbone_ckpt': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_bert_v1.2.pt', 'point_backbone_config_name': 'PointTransformer_8192point_2layer', 'use_color': True, 'output_dir': 'outputs/PointLLM_train_stage2/test_stage2', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'evaluation_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': 'None', 'per_gpu_eval_batch_size': 'None', 'gradient_accumulation_steps': 64, 'eval_accumulation_steps': 'None', 'eval_delay': 0, 'learning_rate': 3e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 1, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/PointLLM_train_stage2/test_stage2/runs/Jan26_23-48-13_audio-73426-task1-0', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 300, 'save_total_limit': 1, 'save_on_each_node': False, 'no_cuda': False, 'use_mps_device': False, 'seed': 42, 'data_seed': 'None', 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'cuda_amp', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': 'None', 'local_rank': 0, 'xpu_backend': 'None', 'tpu_num_cores': 'None', 'tpu_metrics_debug': False, 'debug': '[]', 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'past_index': -1, 'run_name': 'PointLLM_train_stage2', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': 'None', 'load_best_model_at_end': False, 'metric_for_best_model': 'None', 'greater_is_better': 'None', 'ignore_data_skip': False, 'sharded_ddp': '[]', 'fsdp': "['full_shard', 'auto_wrap']", 'fsdp_min_num_params': 0, 'fsdp_config': "{'fsdp_min_num_params': 0, 'fsdp_transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_grad_ckpt': False}", 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'deepspeed': 'None', 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': 'None', 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': "['wandb']", 'ddp_find_unused_parameters': 'None', 'ddp_bucket_cap_mb': 'None', 'dataloader_pin_memory': True, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': 'None', 'hub_model_id': 'None', 'hub_strategy': 'every_save', 'hub_token': '', 'hub_private_repo': False, 'gradient_checkpointing': True, 'include_inputs_for_metrics': False, 'fp16_backend': 'auto', 'push_to_hub_model_id': 'None', 'push_to_hub_organization': 'None', 'push_to_hub_token': '', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': 'None', 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': 'None', 'torch_compile_mode': 'None', 'cache_dir': '/code/syr/PointLLM/cache_dir', 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'fix_pointnet': True, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_proj.bin', 'detatch_point_token': '', 'train_batch_size': 1, 'eval_batch_size': 1} 2025-01-26 23:49:35 - ERROR - stderr - 0%| | 0/1 [00:00 2025-01-26 23:49:44 - ERROR - stderr - train() 2025-01-26 23:49:44 - ERROR - stderr - File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train 2025-01-26 23:49:44 - ERROR - stderr - trainer.train() 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train 2025-01-26 23:49:44 - ERROR - stderr - return inner_training_loop( 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1978, in _inner_training_loop 2025-01-26 23:49:44 - ERROR - stderr - self.optimizer.step() 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper 2025-01-26 23:49:44 - ERROR - stderr - return func.__get__(opt, opt.__class__)(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper 2025-01-26 23:49:44 - ERROR - stderr - out = func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 91, in _use_grad 2025-01-26 23:49:44 - ERROR - stderr - ret = func(self, *args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 220, in step 2025-01-26 23:49:44 - ERROR - stderr - adamw( 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 154, in maybe_fallback 2025-01-26 23:49:44 - ERROR - stderr - return func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 782, in adamw 2025-01-26 23:49:44 - ERROR - stderr - func( 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw 2025-01-26 23:49:44 - ERROR - stderr - grouped_tensors = Optimizer._group_tensors_by_device_and_dtype( 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype 2025-01-26 23:49:44 - ERROR - stderr - return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices) # type: ignore[return-value, arg-type] 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context 2025-01-26 23:49:44 - ERROR - stderr - return func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype 2025-01-26 23:49:44 - ERROR - stderr - return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices) 2025-01-26 23:49:44 - ERROR - stderr - RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: Traceback (most recent call last): 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: train() 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: trainer.train() 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: return inner_training_loop( 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1978, in _inner_training_loop 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: self.optimizer.step() 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: return func.__get__(opt, opt.__class__)(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: out = func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 91, in _use_grad 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: ret = func(self, *args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 220, in step 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: adamw( 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 154, in maybe_fallback 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: return func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 782, in adamw 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: func( 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: grouped_tensors = Optimizer._group_tensors_by_device_and_dtype( 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices) # type: ignore[return-value, arg-type] 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: return func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices) 2025-01-26 23:49:44 - ERROR - stderr - [rank0]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding 2025-01-26 23:49:44 - WARNING - wandb - message_loop has been closed 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: Traceback (most recent call last): 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/code/syr/PointLLM/pointllm/train/train_mem.py", line 13, in 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: train() 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/code/syr/PointLLM/pointllm/train/train.py", line 246, in train 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: trainer.train() 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: return inner_training_loop( 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py", line 1978, in _inner_training_loop 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: self.optimizer.step() 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: return func.__get__(opt, opt.__class__)(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: out = func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 91, in _use_grad 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: ret = func(self, *args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 220, in step 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: adamw( 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 154, in maybe_fallback 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: return func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 782, in adamw 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: func( 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/adamw.py", line 480, in _multi_tensor_adamw 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: grouped_tensors = Optimizer._group_tensors_by_device_and_dtype( 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/optim/optimizer.py", line 516, in _group_tensors_by_device_and_dtype 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices) # type: ignore[return-value, arg-type] 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: return func(*args, **kwargs) 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: File "/opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/utils/_foreach_utils.py", line 37, in _group_tensors_by_device_and_dtype 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: return torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices) 2025-01-26 23:49:44 - ERROR - stderr - [rank1]: RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32/64 notwithstanding 2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Using PointBERT. 2025-01-26 23:52:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml. 2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Using 6 dim of points. 2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513. 2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Point backbone output dim: 384. 2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers. 2025-01-26 23:52:51 - INFO - pointllm.model.pointllm - Using PointBERT. 2025-01-26 23:52:51 - INFO - stdout - Loading PointBERT config from /code/syr/PointLLM/pointllm/model/pointbert/PointTransformer_8192point_2layer.yaml. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Point projector output dim: 4096. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Using 6 dim of points. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Use max pool is False. Number of point token is 513. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Point backbone output dim: 384. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Use 2 projection hiddent layers. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Each layer with [1024, 2048] hidden units. 2025-01-26 23:52:52 - INFO - pointllm.model.pointllm - Point projector output dim: 4096. 2025-01-26 23:52:52 - ERROR - stderr - Loading checkpoint shards: 0%| | 0/3 [00:00} 2025-01-26 23:53:11 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:53:11 - ERROR - stderr - warnings.warn( 2025-01-26 23:53:13 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:20<00:00, 6.52s/it] 2025-01-26 23:53:13 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:20<00:00, 6.76s/it] 2025-01-26 23:53:13 - ERROR - stderr - 2025-01-26 23:53:13 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False 2025-01-26 23:53:13 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded. 2025-01-26 23:53:13 - INFO - pointllm.train.train - Point projection layer is trainable. 2025-01-26 23:53:13 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled.json. 2025-01-26 23:53:13 - INFO - stdout - Using conversation_type: ['simple_description'] 2025-01-26 23:53:13 - INFO - stdout - Before filtering, the dataset size is: 6. 2025-01-26 23:53:13 - INFO - stdout - After filtering, the dataset size is: 6. 2025-01-26 23:53:13 - INFO - stdout - Number of simple_description: 6 2025-01-26 23:53:13 - INFO - transformers.trainer - Using cuda_amp half precision backend 2025-01-26 23:53:13 - INFO - transformers.trainer - Using cuda_amp half precision backend 2025-01-26 23:53:13 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-01-26 23:53:13 - ERROR - stderr - {} 2025-01-26 23:53:13 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:53:13 - ERROR - stderr - warnings.warn( 2025-01-26 23:53:16 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. 2025-01-26 23:53:16 - ERROR - stderr - else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype) 2025-01-26 23:53:19 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:53:19 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:53:19 - INFO - transformers.trainer - Num examples = 6 2025-01-26 23:53:19 - INFO - transformers.trainer - Num examples = 6 2025-01-26 23:53:19 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:53:19 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:53:19 - INFO - transformers.trainer - Instantaneous batch size per device = 2 2025-01-26 23:53:19 - INFO - transformers.trainer - Instantaneous batch size per device = 2 2025-01-26 23:53:19 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 4 2025-01-26 23:53:19 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 4 2025-01-26 23:53:19 - INFO - transformers.trainer - Gradient Accumulation steps = 1 2025-01-26 23:53:19 - INFO - transformers.trainer - Gradient Accumulation steps = 1 2025-01-26 23:53:19 - INFO - transformers.trainer - Total optimization steps = 2 2025-01-26 23:53:19 - INFO - transformers.trainer - Total optimization steps = 2 2025-01-26 23:53:19 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:53:19 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:53:19 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:53:19 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:53:24 - ERROR - stderr - wandb: W&B API key is configured. Use `wandb login --relogin` to force relogin 2025-01-26 23:53:24 - INFO - wandb - Current SDK version is 0.19.4 2025-01-26 23:53:24 - INFO - wandb - Configure stats pid to 46170 2025-01-26 23:53:24 - INFO - wandb - Loading settings from /root/.config/wandb/settings 2025-01-26 23:53:24 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings 2025-01-26 23:53:24 - INFO - wandb - Loading settings from environment variables 2025-01-26 23:53:24 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_235324-odc8wu1v/logs/debug.log 2025-01-26 23:53:24 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_235324-odc8wu1v/logs/debug-internal.log 2025-01-26 23:53:24 - INFO - wandb - calling init triggers 2025-01-26 23:53:24 - INFO - wandb - wandb.init called with sweep_config: {} config: {} 2025-01-26 23:53:24 - INFO - wandb - starting backend 2025-01-26 23:53:24 - ERROR - stderr - wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. 2025-01-26 23:53:24 - INFO - wandb - sending inform_init request 2025-01-26 23:53:24 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings: 2025-01-26 23:53:24 - ERROR - stderr - Expected `list[str]` but got `tuple` - serialized value may not be as expected 2025-01-26 23:53:24 - ERROR - stderr - return self.__pydantic_serializer__.to_python( 2025-01-26 23:53:24 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn 2025-01-26 23:53:24 - INFO - wandb - backend started and connected 2025-01-26 23:53:24 - DEBUG - wandb - no default config file found in config-defaults.yaml 2025-01-26 23:53:24 - INFO - wandb - updated telemetry 2025-01-26 23:53:24 - INFO - wandb - communicating run to backend with 90.0 second timeout 2025-01-26 23:53:25 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:53:25 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:53:25 - INFO - wandb - starting run threads in backend 2025-01-26 23:53:25 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings: 2025-01-26 23:53:25 - ERROR - stderr - Expected `list[str]` but got `tuple` - serialized value may not be as expected 2025-01-26 23:53:25 - ERROR - stderr - return self.__pydantic_serializer__.to_python( 2025-01-26 23:53:25 - ERROR - stderr - wandb: Tracking run with wandb version 0.19.4 2025-01-26 23:53:25 - ERROR - stderr - wandb: Run data is saved locally in /code/syr/PointLLM/wandb/run-20250126_235324-odc8wu1v 2025-01-26 23:53:25 - ERROR - stderr - wandb: Run `wandb offline` to turn off syncing. 2025-01-26 23:53:25 - ERROR - stderr - wandb: Syncing run PointLLM_train_stage2 2025-01-26 23:53:25 - ERROR - stderr - wandb: ⭐️ View project at https://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface 2025-01-26 23:53:25 - ERROR - stderr - wandb: 🚀 View run at https://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface/runs/odc8wu1v 2025-01-26 23:53:25 - DEBUG - wandb - Saving list of pip packages installed into the current environment 2025-01-26 23:53:26 - INFO - wandb - atexit reg 2025-01-26 23:53:26 - INFO - wandb - redirect: wrap_raw 2025-01-26 23:53:26 - INFO - wandb - Wrapping output streams. 2025-01-26 23:53:26 - INFO - wandb - Redirects installed. 2025-01-26 23:53:26 - INFO - wandb - run started, returning control to user process 2025-01-26 23:53:26 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'chunk_size_feed_forward': 0, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['PointLLMLlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2', 'transformers_version': '4.28.0.dev0', 'DEFAULT_POINT_END_TOKEN': '', 'DEFAULT_POINT_PATCH_TOKEN': '', 'DEFAULT_POINT_START_TOKEN': '', 'max_position_embeddings': 2048, 'mm_use_point_start_end': True, 'model_type': 'pointllm', 'point_backbone': 'PointBERT', 'point_backbone_ckpt': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_bert_v1.2.pt', 'point_backbone_config_name': 'PointTransformer_8192point_2layer', 'use_color': True, 'output_dir': 'outputs/PointLLM_train_stage2/test_stage2', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'evaluation_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 2, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': 'None', 'per_gpu_eval_batch_size': 'None', 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': 'None', 'eval_delay': 0, 'learning_rate': 3e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 1, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/PointLLM_train_stage2/test_stage2/runs/Jan26_23-52-14_audio-73426-task1-0', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 300, 'save_total_limit': 1, 'save_on_each_node': False, 'no_cuda': False, 'use_mps_device': False, 'seed': 42, 'data_seed': 'None', 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'cuda_amp', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': 'None', 'local_rank': 0, 'xpu_backend': 'None', 'tpu_num_cores': 'None', 'tpu_metrics_debug': False, 'debug': '[]', 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'past_index': -1, 'run_name': 'PointLLM_train_stage2', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': 'None', 'load_best_model_at_end': False, 'metric_for_best_model': 'None', 'greater_is_better': 'None', 'ignore_data_skip': False, 'sharded_ddp': '[]', 'fsdp': "['full_shard', 'auto_wrap']", 'fsdp_min_num_params': 0, 'fsdp_config': "{'fsdp_min_num_params': 0, 'fsdp_transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_grad_ckpt': False}", 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'deepspeed': 'None', 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': 'None', 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': "['wandb']", 'ddp_find_unused_parameters': 'None', 'ddp_bucket_cap_mb': 'None', 'dataloader_pin_memory': True, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': 'None', 'hub_model_id': 'None', 'hub_strategy': 'every_save', 'hub_token': '', 'hub_private_repo': False, 'gradient_checkpointing': True, 'include_inputs_for_metrics': False, 'fp16_backend': 'auto', 'push_to_hub_model_id': 'None', 'push_to_hub_organization': 'None', 'push_to_hub_token': '', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': 'None', 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': 'None', 'torch_compile_mode': 'None', 'cache_dir': '/code/syr/PointLLM/cache_dir', 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'fix_pointnet': True, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_proj.bin', 'detatch_point_token': '', 'train_batch_size': 2, 'eval_batch_size': 1} 2025-01-26 23:53:26 - ERROR - stderr - 0%| | 0/2 [00:00} 2025-01-26 23:56:48 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:56:48 - ERROR - stderr - warnings.warn( 2025-01-26 23:56:48 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:26<00:00, 8.54s/it] 2025-01-26 23:56:48 - ERROR - stderr - Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:26<00:00, 8.79s/it] 2025-01-26 23:56:48 - ERROR - stderr - 2025-01-26 23:56:48 - WARNING - pointllm.train.train - LLM is trainable. Fix_llm flag is set to False 2025-01-26 23:56:48 - INFO - pointllm.train.train - Point backbone is fixed. Fix_pointnet flag is set to True, pointnet grad will not be recorded. 2025-01-26 23:56:48 - INFO - pointllm.train.train - Point projection layer is trainable. 2025-01-26 23:56:48 - INFO - stdout - Loading anno file from /code/syr/PointLLM/yj_data/PLM-Finetune/ready2use/combined_shuffled_orin.json. 2025-01-26 23:56:49 - INFO - stdout - Using conversation_type: ['simple_description'] 2025-01-26 23:56:49 - INFO - stdout - Before filtering, the dataset size is: 60000. 2025-01-26 23:56:49 - INFO - stdout - After filtering, the dataset size is: 60000. 2025-01-26 23:56:49 - INFO - stdout - Number of simple_description: 60000 2025-01-26 23:56:49 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:118: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-01-26 23:56:49 - ERROR - stderr - {} 2025-01-26 23:56:49 - ERROR - stderr - These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-01-26 23:56:49 - ERROR - stderr - warnings.warn( 2025-01-26 23:56:54 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:56:54 - INFO - transformers.trainer - ***** Running training ***** 2025-01-26 23:56:54 - INFO - transformers.trainer - Num examples = 60000 2025-01-26 23:56:54 - INFO - transformers.trainer - Num examples = 60000 2025-01-26 23:56:54 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:56:54 - INFO - transformers.trainer - Num Epochs = 1 2025-01-26 23:56:54 - INFO - transformers.trainer - Instantaneous batch size per device = 2 2025-01-26 23:56:54 - INFO - transformers.trainer - Instantaneous batch size per device = 2 2025-01-26 23:56:54 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 128 2025-01-26 23:56:54 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 128 2025-01-26 23:56:54 - INFO - transformers.trainer - Gradient Accumulation steps = 32 2025-01-26 23:56:54 - INFO - transformers.trainer - Gradient Accumulation steps = 32 2025-01-26 23:56:54 - INFO - transformers.trainer - Total optimization steps = 468 2025-01-26 23:56:54 - INFO - transformers.trainer - Total optimization steps = 468 2025-01-26 23:56:54 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:56:54 - INFO - transformers.trainer - Number of trainable parameters = 3385592768 2025-01-26 23:56:54 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:56:54 - INFO - transformers.integrations - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 2025-01-26 23:56:54 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/transformers/trainer.py:2622: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. 2025-01-26 23:56:54 - ERROR - stderr - else torch.cuda.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype) 2025-01-26 23:56:56 - ERROR - stderr - wandb: Currently logged in as: 1282467298 (1282467298-university-of-nottingham-ningbo-china). Use `wandb login --relogin` to force relogin 2025-01-26 23:56:56 - INFO - wandb - Current SDK version is 0.19.4 2025-01-26 23:56:56 - INFO - wandb - Configure stats pid to 47909 2025-01-26 23:56:56 - INFO - wandb - Loading settings from /root/.config/wandb/settings 2025-01-26 23:56:56 - INFO - wandb - Loading settings from /code/syr/PointLLM/wandb/settings 2025-01-26 23:56:56 - INFO - wandb - Loading settings from environment variables 2025-01-26 23:56:56 - INFO - wandb - Logging user logs to /code/syr/PointLLM/wandb/run-20250126_235656-w1cwgs9g/logs/debug.log 2025-01-26 23:56:56 - INFO - wandb - Logging internal logs to /code/syr/PointLLM/wandb/run-20250126_235656-w1cwgs9g/logs/debug-internal.log 2025-01-26 23:56:56 - INFO - wandb - calling init triggers 2025-01-26 23:56:56 - INFO - wandb - wandb.init called with sweep_config: {} config: {} 2025-01-26 23:56:56 - INFO - wandb - starting backend 2025-01-26 23:56:56 - ERROR - stderr - wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. 2025-01-26 23:56:56 - INFO - wandb - sending inform_init request 2025-01-26 23:56:56 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings: 2025-01-26 23:56:56 - ERROR - stderr - Expected `list[str]` but got `tuple` - serialized value may not be as expected 2025-01-26 23:56:56 - ERROR - stderr - return self.__pydantic_serializer__.to_python( 2025-01-26 23:56:56 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn 2025-01-26 23:56:56 - INFO - wandb - backend started and connected 2025-01-26 23:56:56 - DEBUG - wandb - no default config file found in config-defaults.yaml 2025-01-26 23:56:56 - INFO - wandb - updated telemetry 2025-01-26 23:56:56 - INFO - wandb - communicating run to backend with 90.0 second timeout 2025-01-26 23:56:57 - ERROR - stderr - wandb: - Waiting for wandb.init()... 2025-01-26 23:56:57 - ERROR - stderr - wandb: \ Waiting for wandb.init()... 2025-01-26 23:56:57 - INFO - wandb - starting run threads in backend 2025-01-26 23:56:57 - ERROR - stderr - /opt/conda/envs/llava_unet/lib/python3.10/site-packages/pydantic/main.py:314: UserWarning: Pydantic serializer warnings: 2025-01-26 23:56:57 - ERROR - stderr - Expected `list[str]` but got `tuple` - serialized value may not be as expected 2025-01-26 23:56:57 - ERROR - stderr - return self.__pydantic_serializer__.to_python( 2025-01-26 23:56:57 - ERROR - stderr - wandb: Tracking run with wandb version 0.19.4 2025-01-26 23:56:57 - ERROR - stderr - wandb: Run data is saved locally in /code/syr/PointLLM/wandb/run-20250126_235656-w1cwgs9g 2025-01-26 23:56:57 - ERROR - stderr - wandb: Run `wandb offline` to turn off syncing. 2025-01-26 23:56:57 - ERROR - stderr - wandb: Syncing run PointLLM_train_stage2 2025-01-26 23:56:57 - ERROR - stderr - wandb: ⭐️ View project at https://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface 2025-01-26 23:56:57 - ERROR - stderr - wandb: 🚀 View run at https://wandb.ai/1282467298-university-of-nottingham-ningbo-china/huggingface/runs/w1cwgs9g 2025-01-26 23:56:57 - DEBUG - wandb - Saving list of pip packages installed into the current environment 2025-01-26 23:56:57 - INFO - wandb - atexit reg 2025-01-26 23:56:57 - INFO - wandb - redirect: wrap_raw 2025-01-26 23:56:57 - INFO - wandb - Wrapping output streams. 2025-01-26 23:56:57 - INFO - wandb - Redirects installed. 2025-01-26 23:56:57 - INFO - wandb - run started, returning control to user process 2025-01-26 23:56:57 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'chunk_size_feed_forward': 0, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['PointLLMLlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2', 'transformers_version': '4.28.0.dev0', 'DEFAULT_POINT_END_TOKEN': '', 'DEFAULT_POINT_PATCH_TOKEN': '', 'DEFAULT_POINT_START_TOKEN': '', 'max_position_embeddings': 2048, 'mm_use_point_start_end': True, 'model_type': 'pointllm', 'point_backbone': 'PointBERT', 'point_backbone_ckpt': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_bert_v1.2.pt', 'point_backbone_config_name': 'PointTransformer_8192point_2layer', 'use_color': True, 'output_dir': 'outputs/PointLLM_train_stage2/test_stage2', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'evaluation_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 2, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': 'None', 'per_gpu_eval_batch_size': 'None', 'gradient_accumulation_steps': 32, 'eval_accumulation_steps': 'None', 'eval_delay': 0, 'learning_rate': 3e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 1, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/PointLLM_train_stage2/test_stage2/runs/Jan26_23-55-42_audio-73426-task1-0', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 300, 'save_total_limit': 1, 'save_on_each_node': False, 'no_cuda': False, 'use_mps_device': False, 'seed': 42, 'data_seed': 'None', 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'cuda_amp', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': 'None', 'local_rank': 0, 'xpu_backend': 'None', 'tpu_num_cores': 'None', 'tpu_metrics_debug': False, 'debug': '[]', 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'past_index': -1, 'run_name': 'PointLLM_train_stage2', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': 'None', 'load_best_model_at_end': False, 'metric_for_best_model': 'None', 'greater_is_better': 'None', 'ignore_data_skip': False, 'sharded_ddp': '[]', 'fsdp': "['full_shard', 'auto_wrap']", 'fsdp_min_num_params': 0, 'fsdp_config': "{'fsdp_min_num_params': 0, 'fsdp_transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_grad_ckpt': False}", 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'deepspeed': 'None', 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': 'None', 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': "['wandb']", 'ddp_find_unused_parameters': 'None', 'ddp_bucket_cap_mb': 'None', 'dataloader_pin_memory': True, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': 'None', 'hub_model_id': 'None', 'hub_strategy': 'every_save', 'hub_token': '', 'hub_private_repo': False, 'gradient_checkpointing': True, 'include_inputs_for_metrics': False, 'fp16_backend': 'auto', 'push_to_hub_model_id': 'None', 'push_to_hub_organization': 'None', 'push_to_hub_token': '', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': 'None', 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': 'None', 'torch_compile_mode': 'None', 'cache_dir': '/code/syr/PointLLM/cache_dir', 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'fix_pointnet': True, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': '/code/syr/PointLLM/checkpoints/PointLLM_7B_v1.2/point_proj.bin', 'detatch_point_token': '', 'train_batch_size': 2, 'eval_batch_size': 1} 2025-01-26 23:56:57 - ERROR - stderr - 0%| | 0/468 [00:00