|
Sun Feb 9 17:38:13 2025: Configuration file path: shell/lzq/eagle_commercial_llama3_2_3b_eagle_v11_gl_16k_128gpus_ga1.sh |
|
Sun Feb 9 17:38:13 2025: Output directory path: work_dirs/eagle_commercial_llama3_2_3b_eagle_v11_gl_16k_128gpus_ga1 |
|
Sun Feb 9 17:38:13 2025: GPUs to use: 128 |
|
Sun Feb 9 17:38:13 2025: DEPENDENT_CLONES to use: 7 |
|
Sun Feb 9 17:38:13 2025: Use batch short: False |
|
Sun Feb 9 17:38:13 2025: Use container: /home/zhidingy/workspace/dockers/torch_video.sqsh |
|
Sun Feb 9 17:38:13 2025: nnodes to use: 16 |
|
Sun Feb 9 17:39:09 2025: monitor jobs in order [2901405, 2901408, 2901410, 2901413, 2901417, 2901419, 2901422, 2901425] |
|
Sun Feb 9 17:39:09 2025: If you want scancel these jobs use this cmd |
|
scancel 2901405 2901408 2901410 2901413 2901417 2901419 2901422 2901425 |
|
Sun Feb 9 17:39:09 2025: begin monitor job 2901405 |
|
Sun Feb 9 17:39:09 2025: check job: 2901405 |
|
Sun Feb 9 17:39:20 2025: job 2901405 PENDING |
|
Sun Feb 9 17:44:20 2025: check job: 2901405 |
|
Sun Feb 9 17:44:25 2025: job 2901405 PENDING |
|
Sun Feb 9 17:49:25 2025: check job: 2901405 |
|
Sun Feb 9 17:49:37 2025: job 2901405 PENDING |
|
Sun Feb 9 17:54:37 2025: check job: 2901405 |
|
Sun Feb 9 17:54:49 2025: job 2901405 PENDING |
|
Sun Feb 9 17:59:49 2025: check job: 2901405 |
|
Sun Feb 9 18:05:06 2025: check job: 2901405 |
|
Sun Feb 9 18:10:20 2025: check job: 2901405 |
|
Sun Feb 9 18:15:38 2025: check job: 2901405 |
|
Sun Feb 9 18:20:54 2025: check job: 2901405 |
|
Sun Feb 9 18:26:03 2025: check job: 2901405 |
|
Sun Feb 9 18:31:19 2025: check job: 2901405 |
|
Sun Feb 9 18:36:35 2025: check job: 2901405 |
|
Sun Feb 9 18:41:40 2025: check job: 2901405 |
|
Sun Feb 9 18:46:55 2025: check job: 2901405 |
|
Sun Feb 9 18:52:10 2025: check job: 2901405 |
|
Sun Feb 9 18:57:22 2025: check job: 2901405 |
|
Sun Feb 9 19:02:32 2025: check job: 2901405 |
|
Sun Feb 9 19:07:46 2025: check job: 2901405 |
|
Sun Feb 9 19:12:59 2025: check job: 2901405 |
|
Sun Feb 9 19:18:18 2025: check job: 2901405 |
|
Sun Feb 9 19:23:33 2025: check job: 2901405 |
|
Sun Feb 9 19:28:44 2025: check job: 2901405 |
|
Sun Feb 9 19:33:56 2025: check job: 2901405 |
|
Sun Feb 9 19:39:08 2025: check job: 2901405 |
|
Sun Feb 9 19:44:20 2025: check job: 2901405 |
|
Sun Feb 9 19:49:34 2025: check job: 2901405 |
|
Sun Feb 9 19:54:46 2025: check job: 2901405 |
|
Sun Feb 9 19:59:57 2025: check job: 2901405 |
|
Sun Feb 9 20:05:10 2025: check job: 2901405 |
|
Sun Feb 9 20:10:22 2025: check job: 2901405 |
|
Sun Feb 9 20:15:34 2025: check job: 2901405 |
|
Sun Feb 9 20:20:46 2025: check job: 2901405 |
|
Sun Feb 9 20:25:58 2025: check job: 2901405 |
|
Sun Feb 9 20:31:10 2025: check job: 2901405 |
|
Sun Feb 9 20:36:22 2025: check job: 2901405 |
|
Sun Feb 9 20:41:33 2025: check job: 2901405 |
|
Sun Feb 9 20:46:45 2025: check job: 2901405 |
|
Sun Feb 9 20:51:57 2025: check job: 2901405 |
|
Sun Feb 9 20:57:09 2025: check job: 2901405 |
|
Sun Feb 9 21:02:22 2025: check job: 2901405 |
|
Sun Feb 9 21:07:35 2025: check job: 2901405 |
|
Sun Feb 9 21:12:47 2025: check job: 2901405 |
|
Sun Feb 9 21:17:59 2025: check job: 2901405 |
|
Sun Feb 9 21:23:11 2025: check job: 2901405 |
|
Sun Feb 9 21:28:22 2025: check job: 2901405 |
|
Sun Feb 9 21:33:34 2025: check job: 2901405 |
|
Sun Feb 9 21:38:47 2025: check job: 2901405 |
|
Sun Feb 9 21:43:59 2025: check job: 2901405 |
|
Sun Feb 9 21:49:12 2025: check job: 2901405 |
|
Sun Feb 9 21:54:24 2025: check job: 2901405 |
|
Sun Feb 9 21:59:36 2025: check job: 2901405 |
|
Sun Feb 9 21:59:47 2025: job 2901405 done |
|
Sun Feb 9 21:59:47 2025: begin monitor job 2901408 |
|
Sun Feb 9 21:59:47 2025: check job: 2901408 |
|
Sun Feb 9 21:59:56 2025: job 2901408 PENDING |
|
Sun Feb 9 22:04:56 2025: check job: 2901408 |
|
Sun Feb 9 22:05:02 2025: job 2901408 PENDING |
|
Sun Feb 9 22:10:02 2025: check job: 2901408 |
|
Sun Feb 9 22:10:13 2025: job 2901408 PENDING |
|
Sun Feb 9 22:15:13 2025: check job: 2901408 |
|
Sun Feb 9 22:20:25 2025: check job: 2901408 |
|
Sun Feb 9 22:25:37 2025: check job: 2901408 |
|
Sun Feb 9 22:30:48 2025: check job: 2901408 |
|
Sun Feb 9 22:36:00 2025: check job: 2901408 |
|
Sun Feb 9 22:41:12 2025: check job: 2901408 |
|
Sun Feb 9 22:46:24 2025: check job: 2901408 |
|
Sun Feb 9 22:51:31 2025: check job: 2901408 |
|
Sun Feb 9 22:56:42 2025: check job: 2901408 |
|
Sun Feb 9 23:01:47 2025: check job: 2901408 |
|
Sun Feb 9 23:06:53 2025: check job: 2901408 |
|
Sun Feb 9 23:12:04 2025: check job: 2901408 |
|
Sun Feb 9 23:17:09 2025: check job: 2901408 |
|
Sun Feb 9 23:22:14 2025: check job: 2901408 |
|
Sun Feb 9 23:27:25 2025: check job: 2901408 |
|
Sun Feb 9 23:32:37 2025: check job: 2901408 |
|
Sun Feb 9 23:37:47 2025: check job: 2901408 |
|
Sun Feb 9 23:43:01 2025: check job: 2901408 |
|
Sun Feb 9 23:48:14 2025: check job: 2901408 |
|
Sun Feb 9 23:53:25 2025: check job: 2901408 |
|
Sun Feb 9 23:58:39 2025: check job: 2901408 |
|
Mon Feb 10 00:03:50 2025: check job: 2901408 |
|
Mon Feb 10 00:09:01 2025: check job: 2901408 |
|
Mon Feb 10 00:14:15 2025: check job: 2901408 |
|
Mon Feb 10 00:19:27 2025: check job: 2901408 |
|
Mon Feb 10 00:24:42 2025: check job: 2901408 |
|
Mon Feb 10 00:29:54 2025: check job: 2901408 |
|
Mon Feb 10 00:35:05 2025: check job: 2901408 |
|
Mon Feb 10 00:40:17 2025: check job: 2901408 |
|
Mon Feb 10 00:45:29 2025: check job: 2901408 |
|
Mon Feb 10 00:50:40 2025: check job: 2901408 |
|
Mon Feb 10 00:55:53 2025: check job: 2901408 |
|
Mon Feb 10 01:01:05 2025: check job: 2901408 |
|
Mon Feb 10 01:06:18 2025: check job: 2901408 |
|
Mon Feb 10 01:11:32 2025: check job: 2901408 |
|
Mon Feb 10 01:16:44 2025: check job: 2901408 |
|
Mon Feb 10 01:21:57 2025: check job: 2901408 |
|
Mon Feb 10 01:27:10 2025: check job: 2901408 |
|
Mon Feb 10 01:32:22 2025: check job: 2901408 |
|
Mon Feb 10 01:37:35 2025: check job: 2901408 |
|
Mon Feb 10 01:42:46 2025: check job: 2901408 |
|
Mon Feb 10 01:48:06 2025: check job: 2901408 |
|
Mon Feb 10 01:53:20 2025: check job: 2901408 |
|
Mon Feb 10 01:58:39 2025: check job: 2901408 |
|
Mon Feb 10 02:03:55 2025: check job: 2901408 |
|
Mon Feb 10 02:09:12 2025: check job: 2901408 |
|
Mon Feb 10 02:14:26 2025: check job: 2901408 |
|
Mon Feb 10 02:14:39 2025: job 2901408 done |
|
Mon Feb 10 02:14:39 2025: begin monitor job 2901410 |
|
Mon Feb 10 02:14:39 2025: check job: 2901410 |
|
Mon Feb 10 02:19:46 2025: check job: 2901410 |
|
Mon Feb 10 02:24:59 2025: check job: 2901410 |
|
Mon Feb 10 02:30:13 2025: check job: 2901410 |
|
Mon Feb 10 02:35:29 2025: check job: 2901410 |
|
Mon Feb 10 02:40:43 2025: check job: 2901410 |
|
Mon Feb 10 02:45:56 2025: check job: 2901410 |
|
Mon Feb 10 02:51:10 2025: check job: 2901410 |
|
Mon Feb 10 02:56:22 2025: check job: 2901410 |
|
Mon Feb 10 03:01:37 2025: check job: 2901410 |
|
Mon Feb 10 03:06:51 2025: check job: 2901410 |
|
Mon Feb 10 03:12:09 2025: check job: 2901410 |
|
Mon Feb 10 03:17:35 2025: check job: 2901410 |
|
Mon Feb 10 03:22:54 2025: check job: 2901410 |
|
Mon Feb 10 03:28:21 2025: check job: 2901410 |
|
Mon Feb 10 03:33:33 2025: check job: 2901410 |
|
Mon Feb 10 03:38:51 2025: check job: 2901410 |
|
Mon Feb 10 03:44:07 2025: check job: 2901410 |
|
Mon Feb 10 03:49:18 2025: check job: 2901410 |
|
Mon Feb 10 03:54:36 2025: check job: 2901410 |
|
Mon Feb 10 03:59:47 2025: check job: 2901410 |
|
Mon Feb 10 04:04:58 2025: check job: 2901410 |
|
Mon Feb 10 04:10:23 2025: check job: 2901410 |
|
Mon Feb 10 04:15:38 2025: check job: 2901410 |
|
Mon Feb 10 04:21:05 2025: check job: 2901410 |
|
Mon Feb 10 04:26:16 2025: check job: 2901410 |
|
Mon Feb 10 04:31:43 2025: check job: 2901410 |
|
Mon Feb 10 04:37:02 2025: check job: 2901410 |
|
Mon Feb 10 04:42:22 2025: check job: 2901410 |
|
Mon Feb 10 04:47:57 2025: check job: 2901410 |
|
Mon Feb 10 04:53:10 2025: check job: 2901410 |
|
Mon Feb 10 04:58:23 2025: check job: 2901410 |
|
Mon Feb 10 05:03:37 2025: check job: 2901410 |
|
Mon Feb 10 05:08:51 2025: check job: 2901410 |
|
Mon Feb 10 05:14:28 2025: check job: 2901410 |
|
Mon Feb 10 05:19:41 2025: check job: 2901410 |
|
Mon Feb 10 05:24:56 2025: check job: 2901410 |
|
Mon Feb 10 05:30:08 2025: check job: 2901410 |
|
Mon Feb 10 05:35:21 2025: check job: 2901410 |
|
Mon Feb 10 05:40:39 2025: check job: 2901410 |
|
Mon Feb 10 05:45:53 2025: check job: 2901410 |
|
Mon Feb 10 05:51:07 2025: check job: 2901410 |
|
Mon Feb 10 05:56:19 2025: check job: 2901410 |
|
Mon Feb 10 06:01:31 2025: check job: 2901410 |
|
Mon Feb 10 06:07:01 2025: check job: 2901410 |
|
Mon Feb 10 06:12:13 2025: check job: 2901410 |
|
Mon Feb 10 06:17:28 2025: check job: 2901410 |
|
Mon Feb 10 06:17:44 2025: job 2901410 done |
|
Mon Feb 10 06:17:44 2025: begin monitor job 2901413 |
|
Mon Feb 10 06:17:44 2025: check job: 2901413 |
|
Mon Feb 10 06:17:50 2025: job 2901413 PENDING |
|
Mon Feb 10 06:22:50 2025: check job: 2901413 |
|
Mon Feb 10 06:23:01 2025: job 2901413 PENDING |
|
Mon Feb 10 06:28:01 2025: check job: 2901413 |
|
Mon Feb 10 06:33:23 2025: check job: 2901413 |
|
Mon Feb 10 06:38:35 2025: check job: 2901413 |
|
Mon Feb 10 06:43:52 2025: check job: 2901413 |
|
Mon Feb 10 06:49:04 2025: check job: 2901413 |
|
Mon Feb 10 06:55:11 2025: check job: 2901413 |
|
Mon Feb 10 07:00:27 2025: check job: 2901413 |
|
Mon Feb 10 07:05:39 2025: check job: 2901413 |
|
Mon Feb 10 07:10:51 2025: check job: 2901413 |
|
Mon Feb 10 07:16:04 2025: check job: 2901413 |
|
Mon Feb 10 07:21:19 2025: check job: 2901413 |
|
Mon Feb 10 07:26:34 2025: check job: 2901413 |
|
Mon Feb 10 07:31:47 2025: check job: 2901413 |
|
Mon Feb 10 07:36:59 2025: check job: 2901413 |
|
Mon Feb 10 07:42:12 2025: check job: 2901413 |
|
Mon Feb 10 07:47:28 2025: check job: 2901413 |
|
Mon Feb 10 07:52:41 2025: check job: 2901413 |
|
Mon Feb 10 07:57:53 2025: check job: 2901413 |
|
Mon Feb 10 08:03:05 2025: check job: 2901413 |
|
Mon Feb 10 08:08:16 2025: check job: 2901413 |
|
Mon Feb 10 08:13:28 2025: check job: 2901413 |
|
Mon Feb 10 08:18:45 2025: check job: 2901413 |
|
Mon Feb 10 08:23:51 2025: check job: 2901413 |
|
Mon Feb 10 08:29:03 2025: check job: 2901413 |
|
Mon Feb 10 08:34:08 2025: check job: 2901413 |
|
Mon Feb 10 08:39:19 2025: check job: 2901413 |
|
Mon Feb 10 08:44:31 2025: check job: 2901413 |
|
Mon Feb 10 08:49:43 2025: check job: 2901413 |
|
Mon Feb 10 08:54:55 2025: check job: 2901413 |
|
Mon Feb 10 09:00:06 2025: check job: 2901413 |
|
Mon Feb 10 09:05:18 2025: check job: 2901413 |
|
Mon Feb 10 09:10:32 2025: check job: 2901413 |
|
Mon Feb 10 09:15:45 2025: check job: 2901413 |
|
Mon Feb 10 09:20:51 2025: check job: 2901413 |
|
Mon Feb 10 09:26:03 2025: check job: 2901413 |
|
Mon Feb 10 09:31:15 2025: check job: 2901413 |
|
Mon Feb 10 09:36:28 2025: check job: 2901413 |
|
Mon Feb 10 09:41:41 2025: check job: 2901413 |
|
Mon Feb 10 09:46:53 2025: check job: 2901413 |
|
Mon Feb 10 09:52:05 2025: check job: 2901413 |
|
Mon Feb 10 09:57:18 2025: check job: 2901413 |
|
Mon Feb 10 10:02:48 2025: check job: 2901413 |
|
Mon Feb 10 10:08:01 2025: check job: 2901413 |
|
Mon Feb 10 10:13:14 2025: check job: 2901413 |
|
Mon Feb 10 10:18:28 2025: check job: 2901413 |
|
Mon Feb 10 10:23:42 2025: check job: 2901413 |
|
Mon Feb 10 10:28:53 2025: check job: 2901413 |
|
Mon Feb 10 10:29:05 2025: job 2901413 done |
|
Mon Feb 10 10:29:05 2025: begin monitor job 2901417 |
|
Mon Feb 10 10:29:05 2025: check job: 2901417 |
|
Mon Feb 10 10:34:11 2025: check job: 2901417 |
|
Mon Feb 10 10:39:23 2025: check job: 2901417 |
|
Mon Feb 10 10:44:36 2025: check job: 2901417 |
|
Mon Feb 10 10:49:49 2025: check job: 2901417 |
|
Mon Feb 10 10:55:00 2025: check job: 2901417 |
|
Mon Feb 10 11:00:11 2025: check job: 2901417 |
|
Mon Feb 10 11:05:22 2025: check job: 2901417 |
|
Mon Feb 10 11:10:37 2025: check job: 2901417 |
|
Mon Feb 10 11:15:48 2025: check job: 2901417 |
|
Mon Feb 10 11:20:59 2025: check job: 2901417 |
|
Mon Feb 10 11:26:14 2025: check job: 2901417 |
|
Mon Feb 10 11:31:27 2025: check job: 2901417 |
|
Mon Feb 10 11:36:37 2025: check job: 2901417 |
|
Mon Feb 10 11:41:49 2025: check job: 2901417 |
|
Mon Feb 10 11:47:02 2025: check job: 2901417 |
|
Mon Feb 10 11:52:15 2025: check job: 2901417 |
|
Mon Feb 10 11:57:27 2025: check job: 2901417 |
|
Mon Feb 10 12:02:44 2025: check job: 2901417 |
|
Mon Feb 10 12:07:59 2025: check job: 2901417 |
|
Mon Feb 10 12:13:04 2025: check job: 2901417 |
|
Mon Feb 10 12:18:15 2025: check job: 2901417 |
|
Mon Feb 10 12:18:28 2025: job 2901417 done |
|
Mon Feb 10 12:18:28 2025: work_dirs/eagle_commercial_llama3_2_3b_eagle_v11_gl_16k_128gpus_ga1 finish training |
|
Mon Feb 10 12:18:29 2025: MiaoTiXing: Access successful. |
|
Mon Feb 10 12:18:29 2025: MiaoTiXing: Access successful. |
|
Mon Feb 10 12:18:30 2025: work_dirs/eagle_commercial_llama3_2_3b_eagle_v11_gl_16k_128gpus_ga1 finish training, start auto testing |
|
Mon Feb 10 12:24:32 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:32.892 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:32.893 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:32.898 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:32 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:33.268 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: NOTICE |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:32 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:32 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:32 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:33.269 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:32 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:36.371 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:36.373 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-121836 |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:37.678 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-121836:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-121836:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-121836/node_command_MMMU_DEV_VAL_20250210-121836.sh & |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:37.683 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-121836/sbatch_MMMU_DEV_VAL_20250210-121836.sh |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:37.688 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-121836/cluster_submit_command_MMMU_DEV_VAL_20250210-121836.sh |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:38.166 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:32 2025: Submitted batch job 2908647 |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:38.166 PM PST][INFO][slurm]: Job Id is 2908647 |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:38.527 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:32 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:41.802 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:41.803 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:41.807 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:32 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:42.192 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: NOTICE |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:32 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:32 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:32 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:42.192 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:32 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:45.212 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:45.217 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-121845 |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:46.448 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-121845:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-121845:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-121845/node_command_ChartQA_TEST_20250210-121845.sh & |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:46.453 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-121845/sbatch_ChartQA_TEST_20250210-121845.sh |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:46.458 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-121845/cluster_submit_command_ChartQA_TEST_20250210-121845.sh |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:47.048 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:32 2025: Submitted batch job 2908648 |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:47.049 PM PST][INFO][slurm]: Job Id is 2908648 |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:47.412 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:32 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:50.689 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:50.690 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:50.694 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:32 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:51.056 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: NOTICE |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:32 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:32 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:32 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:51.056 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:32 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:53.679 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:53.679 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_VAL_20250210-121853 |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:54.732 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_VAL_20250210-121853:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_VAL_20250210-121853:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_VAL_20250210-121853/node_command_DocVQA_VAL_20250210-121853.sh & |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:54.737 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_VAL_20250210-121853/sbatch_DocVQA_VAL_20250210-121853.sh |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:54.741 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_VAL_20250210-121853/cluster_submit_command_DocVQA_VAL_20250210-121853.sh |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:55.233 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:32 2025: Submitted batch job 2908649 |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:55.233 PM PST][INFO][slurm]: Job Id is 2908649 |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:18:55.591 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:32 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:28.273 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:28.277 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:28.297 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:32 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:32 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:28.659 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: NOTICE |
|
Mon Feb 10 12:24:32 2025: ================ |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:32 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:32 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:32 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:28.659 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:32 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:32 2025: |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:31.603 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:31.605 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_Pro_20250210-121931 |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:32.721 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_Pro_20250210-121931:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_Pro_20250210-121931:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_Pro_20250210-121931/node_command_MMMU_Pro_20250210-121931.sh & |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:32.726 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_Pro_20250210-121931/sbatch_MMMU_Pro_20250210-121931.sh |
|
Mon Feb 10 12:24:32 2025: [2025-02-10 12:19:32.730 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_Pro_20250210-121931/cluster_submit_command_MMMU_Pro_20250210-121931.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:33.203 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908651 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:33.203 PM PST][INFO][slurm]: Job Id is 2908651 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:33.573 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:36.873 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:36.874 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:36.879 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:37.236 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:37.236 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:39.991 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:39.991 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/Video-MME_20250210-121939 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:40.930 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/Video-MME_20250210-121939:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/Video-MME_20250210-121939:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/Video-MME_20250210-121939/node_command_Video-MME_20250210-121939.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:40.934 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/Video-MME_20250210-121939/sbatch_Video-MME_20250210-121939.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:40.939 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/Video-MME_20250210-121939/cluster_submit_command_Video-MME_20250210-121939.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:41.445 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908653 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:41.445 PM PST][INFO][slurm]: Job Id is 2908653 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:41.964 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:45.137 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:45.139 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:45.142 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:45.515 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:45.515 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:48.955 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:48.958 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/HallusionBench_20250210-121948 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:50.102 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/HallusionBench_20250210-121948:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/HallusionBench_20250210-121948:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/HallusionBench_20250210-121948/node_command_HallusionBench_20250210-121948.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:50.107 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/HallusionBench_20250210-121948/sbatch_HallusionBench_20250210-121948.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:50.111 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/HallusionBench_20250210-121948/cluster_submit_command_HallusionBench_20250210-121948.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:50.724 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908654 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:50.724 PM PST][INFO][slurm]: Job Id is 2908654 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:51.075 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:54.509 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:54.510 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:54.514 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:54.884 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:54.885 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:58.029 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:19:58.048 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ScienceQA_TEST_20250210-121958 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:02.058 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ScienceQA_TEST_20250210-121958:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ScienceQA_TEST_20250210-121958:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ScienceQA_TEST_20250210-121958/node_command_ScienceQA_TEST_20250210-121958.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:02.082 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ScienceQA_TEST_20250210-121958/sbatch_ScienceQA_TEST_20250210-121958.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:02.086 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ScienceQA_TEST_20250210-121958/cluster_submit_command_ScienceQA_TEST_20250210-121958.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:02.738 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908655 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:02.880 PM PST][INFO][slurm]: Job Id is 2908655 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:04.132 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:17.630 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:17.633 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:17.646 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:18.025 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:18.025 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:20.731 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:20.733 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MathVista_MINI_20250210-122020 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:21.815 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MathVista_MINI_20250210-122020:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MathVista_MINI_20250210-122020:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MathVista_MINI_20250210-122020/node_command_MathVista_MINI_20250210-122020.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:21.821 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MathVista_MINI_20250210-122020/sbatch_MathVista_MINI_20250210-122020.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:21.827 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MathVista_MINI_20250210-122020/cluster_submit_command_MathVista_MINI_20250210-122020.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:22.357 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908657 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:22.357 PM PST][INFO][slurm]: Job Id is 2908657 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:22.697 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:25.951 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:25.952 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:25.957 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:26.331 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:26.331 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:29.096 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:29.096 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-122029 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:30.058 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-122029:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-122029:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-122029/node_command_ChartQA_TEST_20250210-122029.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:30.063 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-122029/sbatch_ChartQA_TEST_20250210-122029.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:30.068 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/ChartQA_TEST_20250210-122029/cluster_submit_command_ChartQA_TEST_20250210-122029.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:30.618 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908658 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:30.618 PM PST][INFO][slurm]: Job Id is 2908658 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:30.968 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:34.161 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:34.162 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:34.166 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:34.532 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:34.533 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:37.555 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:37.557 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/TextVQA_VAL_20250210-122037 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:38.708 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/TextVQA_VAL_20250210-122037:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/TextVQA_VAL_20250210-122037:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/TextVQA_VAL_20250210-122037/node_command_TextVQA_VAL_20250210-122037.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:38.712 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/TextVQA_VAL_20250210-122037/sbatch_TextVQA_VAL_20250210-122037.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:38.717 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/TextVQA_VAL_20250210-122037/cluster_submit_command_TextVQA_VAL_20250210-122037.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:39.198 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908659 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:39.198 PM PST][INFO][slurm]: Job Id is 2908659 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:39.545 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:42.928 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:42.930 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:42.934 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:43.310 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:43.310 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:46.143 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:46.143 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/SEEDBench_IMG_20250210-122046 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:47.129 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/SEEDBench_IMG_20250210-122046:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/SEEDBench_IMG_20250210-122046:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/SEEDBench_IMG_20250210-122046/node_command_SEEDBench_IMG_20250210-122046.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:47.134 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/SEEDBench_IMG_20250210-122046/sbatch_SEEDBench_IMG_20250210-122046.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:47.139 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/SEEDBench_IMG_20250210-122046/cluster_submit_command_SEEDBench_IMG_20250210-122046.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:47.608 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908660 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:47.609 PM PST][INFO][slurm]: Job Id is 2908660 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:47.959 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:51.174 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:51.176 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:51.179 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:51.537 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:51.537 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:54.768 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:54.786 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_EN_V11_20250210-122054 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:58.091 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_EN_V11_20250210-122054:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_EN_V11_20250210-122054:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_EN_V11_20250210-122054/node_command_MMBench_DEV_EN_V11_20250210-122054.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:58.118 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_EN_V11_20250210-122054/sbatch_MMBench_DEV_EN_V11_20250210-122054.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:58.123 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_EN_V11_20250210-122054/cluster_submit_command_MMBench_DEV_EN_V11_20250210-122054.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:58.728 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908661 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:20:58.776 PM PST][INFO][slurm]: Job Id is 2908661 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:00.417 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:50.916 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:50.963 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:51.001 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:52.247 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:52.341 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:55.717 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:55.734 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_CN_V11_20250210-122155 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:58.770 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_CN_V11_20250210-122155:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_CN_V11_20250210-122155:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_CN_V11_20250210-122155/node_command_MMBench_DEV_CN_V11_20250210-122155.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:58.875 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_CN_V11_20250210-122155/sbatch_MMBench_DEV_CN_V11_20250210-122155.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:21:58.904 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_DEV_CN_V11_20250210-122155/cluster_submit_command_MMBench_DEV_CN_V11_20250210-122155.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:00.293 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908670 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:00.353 PM PST][INFO][slurm]: Job Id is 2908670 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:01.770 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:07.975 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:07.976 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:07.982 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:08.396 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: NOTICE |
|
Mon Feb 10 12:24:33 2025: ================ |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:33 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:33 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:33 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:08.396 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:33 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:11.227 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:11.230 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-122211 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:12.500 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-122211:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-122211:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-122211/node_command_MMMU_DEV_VAL_20250210-122211.sh & |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:12.505 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-122211/sbatch_MMMU_DEV_VAL_20250210-122211.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:12.509 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMMU_DEV_VAL_20250210-122211/cluster_submit_command_MMMU_DEV_VAL_20250210-122211.sh |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:12.989 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:33 2025: Submitted batch job 2908671 |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:12.990 PM PST][INFO][slurm]: Job Id is 2908671 |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:13.343 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:33 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:18.568 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:18.569 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:18.578 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:33 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:33 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:33 2025: |
|
Mon Feb 10 12:24:33 2025: [2025-02-10 12:22:19.052 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:19.052 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:21.739 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:21.747 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/AI2D_TEST_20250210-122221 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:22.890 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/AI2D_TEST_20250210-122221:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/AI2D_TEST_20250210-122221:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/AI2D_TEST_20250210-122221/node_command_AI2D_TEST_20250210-122221.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:22.895 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/AI2D_TEST_20250210-122221/sbatch_AI2D_TEST_20250210-122221.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:22.899 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/AI2D_TEST_20250210-122221/cluster_submit_command_AI2D_TEST_20250210-122221.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:23.368 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908672 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:23.368 PM PST][INFO][slurm]: Job Id is 2908672 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:23.745 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:26.991 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:26.993 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:26.996 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:27.421 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:27.422 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:30.147 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:30.148 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_VAL_20250210-122230 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:31.095 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_VAL_20250210-122230:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_VAL_20250210-122230:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_VAL_20250210-122230/node_command_InfoVQA_VAL_20250210-122230.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:31.100 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_VAL_20250210-122230/sbatch_InfoVQA_VAL_20250210-122230.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:31.106 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_VAL_20250210-122230/cluster_submit_command_InfoVQA_VAL_20250210-122230.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:32.044 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908673 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:32.045 PM PST][INFO][slurm]: Job Id is 2908673 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:32.398 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:40.567 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:40.568 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:40.577 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:40.975 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:40.975 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:43.603 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:43.605 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/OCRBench_20250210-122243 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:44.733 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/OCRBench_20250210-122243:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/OCRBench_20250210-122243:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/OCRBench_20250210-122243/node_command_OCRBench_20250210-122243.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:44.738 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/OCRBench_20250210-122243/sbatch_OCRBench_20250210-122243.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:44.743 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/OCRBench_20250210-122243/cluster_submit_command_OCRBench_20250210-122243.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:45.214 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908674 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:45.215 PM PST][INFO][slurm]: Job Id is 2908674 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:45.567 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:48.859 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:48.860 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:48.864 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:49.232 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:49.232 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:51.955 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:51.955 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/RealWorldQA_20250210-122251 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:54.856 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/RealWorldQA_20250210-122251:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/RealWorldQA_20250210-122251:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/RealWorldQA_20250210-122251/node_command_RealWorldQA_20250210-122251.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:54.866 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/RealWorldQA_20250210-122251/sbatch_RealWorldQA_20250210-122251.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:54.871 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/RealWorldQA_20250210-122251/cluster_submit_command_RealWorldQA_20250210-122251.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:55.451 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908675 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:55.476 PM PST][INFO][slurm]: Job Id is 2908675 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:22:56.578 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:02.106 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:02.108 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:02.115 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:02.503 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:02.503 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:05.919 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:05.934 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/POPE_20250210-122305 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:07.650 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/POPE_20250210-122305:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/POPE_20250210-122305:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/POPE_20250210-122305/node_command_POPE_20250210-122305.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:07.656 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/POPE_20250210-122305/sbatch_POPE_20250210-122305.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:07.660 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/POPE_20250210-122305/cluster_submit_command_POPE_20250210-122305.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:08.205 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908676 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:08.205 PM PST][INFO][slurm]: Job Id is 2908676 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:08.573 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:12.052 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:12.053 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:12.057 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:12.435 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:12.436 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:15.691 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:15.694 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMVet_20250210-122315 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:16.872 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMVet_20250210-122315:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMVet_20250210-122315:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMVet_20250210-122315/node_command_MMVet_20250210-122315.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:16.877 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMVet_20250210-122315/sbatch_MMVet_20250210-122315.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:16.881 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMVet_20250210-122315/cluster_submit_command_MMVet_20250210-122315.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:17.451 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908677 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:17.451 PM PST][INFO][slurm]: Job Id is 2908677 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:18.228 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:23.271 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:23.272 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:23.278 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:23.875 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:23.875 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:26.855 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:26.857 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MME_20250210-122326 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:27.978 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MME_20250210-122326:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MME_20250210-122326:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MME_20250210-122326/node_command_MME_20250210-122326.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:27.982 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MME_20250210-122326/sbatch_MME_20250210-122326.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:27.987 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MME_20250210-122326/cluster_submit_command_MME_20250210-122326.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:28.570 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908678 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:28.570 PM PST][INFO][slurm]: Job Id is 2908678 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:28.925 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:32.284 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:32.285 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:32.289 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:32.678 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:32.678 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:35.901 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:35.912 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMStar_20250210-122335 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:37.236 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMStar_20250210-122335:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMStar_20250210-122335:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMStar_20250210-122335/node_command_MMStar_20250210-122335.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:37.241 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMStar_20250210-122335/sbatch_MMStar_20250210-122335.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:37.247 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMStar_20250210-122335/cluster_submit_command_MMStar_20250210-122335.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:37.829 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908679 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:37.829 PM PST][INFO][slurm]: Job Id is 2908679 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:38.187 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:43.467 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:43.470 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:43.479 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:43.864 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:43.864 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:46.839 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:46.841 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/CCBench_20250210-122346 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:47.959 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/CCBench_20250210-122346:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/CCBench_20250210-122346:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/CCBench_20250210-122346/node_command_CCBench_20250210-122346.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:47.964 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/CCBench_20250210-122346/sbatch_CCBench_20250210-122346.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:47.968 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/CCBench_20250210-122346/cluster_submit_command_CCBench_20250210-122346.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:48.467 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908680 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:48.467 PM PST][INFO][slurm]: Job Id is 2908680 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:49.115 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:52.659 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:52.661 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:52.664 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:53.187 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:53.187 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:56.439 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:56.439 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_CN_V11_20250210-122356 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:58.671 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_CN_V11_20250210-122356:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_CN_V11_20250210-122356:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_CN_V11_20250210-122356/node_command_MMBench_TEST_CN_V11_20250210-122356.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:58.676 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_CN_V11_20250210-122356/sbatch_MMBench_TEST_CN_V11_20250210-122356.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:58.680 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_CN_V11_20250210-122356/cluster_submit_command_MMBench_TEST_CN_V11_20250210-122356.sh |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:59.312 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:34 2025: Submitted batch job 2908681 |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:23:59.312 PM PST][INFO][slurm]: Job Id is 2908681 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:00.020 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:34 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:03.723 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:03.724 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:03.728 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:34 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:34 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:04.185 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: NOTICE |
|
Mon Feb 10 12:24:34 2025: ================ |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:34 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:34 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:34 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:04.185 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:34 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:34 2025: |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:07.611 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:07.613 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_EN_V11_20250210-122407 |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:08.829 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_EN_V11_20250210-122407:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_EN_V11_20250210-122407:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_EN_V11_20250210-122407/node_command_MMBench_TEST_EN_V11_20250210-122407.sh & |
|
Mon Feb 10 12:24:34 2025: [2025-02-10 12:24:08.834 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_EN_V11_20250210-122407/sbatch_MMBench_TEST_EN_V11_20250210-122407.sh |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:08.842 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/MMBench_TEST_EN_V11_20250210-122407/cluster_submit_command_MMBench_TEST_EN_V11_20250210-122407.sh |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:09.331 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:35 2025: Submitted batch job 2908682 |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:09.331 PM PST][INFO][slurm]: Job Id is 2908682 |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:09.721 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:35 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:15.955 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:15.956 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:15.967 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:35 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:35 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:35 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:16.478 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:35 2025: ================ |
|
Mon Feb 10 12:24:35 2025: NOTICE |
|
Mon Feb 10 12:24:35 2025: ================ |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:35 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:35 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:35 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:16.478 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:35 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:19.259 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:19.262 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_TEST_20250210-122419 |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:20.381 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_TEST_20250210-122419:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_TEST_20250210-122419:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_TEST_20250210-122419/node_command_DocVQA_TEST_20250210-122419.sh & |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:20.386 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_TEST_20250210-122419/sbatch_DocVQA_TEST_20250210-122419.sh |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:20.391 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/DocVQA_TEST_20250210-122419/cluster_submit_command_DocVQA_TEST_20250210-122419.sh |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:20.970 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:35 2025: Submitted batch job 2908684 |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:20.970 PM PST][INFO][slurm]: Job Id is 2908684 |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:21.533 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
Mon Feb 10 12:24:35 2025: CLUSTER=NSS CLUSTER_STACK=NSS CLUSTER_NAME=DRACO_OCI_IAD subdir=nss |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:25.167 PM PST][INFO][load_config]: Loading NSS config: /home/adlr/adlr-utils/release/cluster-interface/latest/nss/config-draco-oci.json |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:25.168 PM PST][INFO][load_config]: Overriding from env: NSS_ADLR_PYTHON=NSSSUB_ADLR_UTILS_ENV_ROOT/python/latest/bin/python |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:25.172 PM PST][WARNING][submit_slurm_parent]: |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:35 2025: This host seems to be a data copier or container build node, which usually will not be set up to submit jobs. |
|
Mon Feb 10 12:24:35 2025: Did you possibly mean to do it from a login node instead? |
|
Mon Feb 10 12:24:35 2025: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:25.784 PM PST][WARNING][submit_job]: |
|
Mon Feb 10 12:24:35 2025: ================ |
|
Mon Feb 10 12:24:35 2025: NOTICE |
|
Mon Feb 10 12:24:35 2025: ================ |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: The log directory structure will be changing in an upcoming release. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: Please use the temporary `--preview_new_logdir` option to try it out with your jobs beforehand. |
|
Mon Feb 10 12:24:35 2025: After the preview period, the new structure will be used for all new jobs (except autoresume follow-ups, which will keep |
|
Mon Feb 10 12:24:35 2025: their original job's log directory structure). |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: Please be advised that some file locations may change due to the new structure, |
|
Mon Feb 10 12:24:35 2025: but user code should be unaffected in most cases. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: If you encounter any issues or have feedback, please reach out to `@adlr-support` in Slack. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:25.784 PM PST][WARNING][submit_job]: `--autoresume_method` is deprecated and will be removed in a future release, when all follow-ups use the requeue method. |
|
Mon Feb 10 12:24:35 2025: Please reach out to `@adlr-support` in Slack if you rely on it and need to discuss options. |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:28.763 PM PST][INFO][submit_slurm_parent]: Forcing exclusive mode |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:28.765 PM PST][INFO][submit_job]: Creating the logdir: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_TEST_20250210-122428 |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:30.213 PM PST][INFO][submit_slurm_parent]: srun_commands=srun --kill-on-bad-exit=1 --container-image=/home/zhidingy/workspace/eagle2/torch2_test.sqsh --container-mounts=/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/python:ro,/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:/lustre/fs11/portfolios/adlr/projects/adlr_other_infra/release/cluster-interface/13.11_2025-02-05_11-20-02:ro,/home/adlr/adlr-utils/release/cluster-interface/latest:/home/adlr/adlr-utils/release/cluster-interface/latest:ro,/dev/fuse:/dev/fuse:rw,/home/zhidingy:/home/zhidingy:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_TEST_20250210-122428:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_TEST_20250210-122428:rw,/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:/lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval:rw,/home/:/home/:rw,/lustre:/lustre:rw /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_TEST_20250210-122428/node_command_InfoVQA_TEST_20250210-122428.sh & |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:30.218 PM PST][INFO][submit_job]: Details of submit command: sbatch /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_TEST_20250210-122428/sbatch_InfoVQA_TEST_20250210-122428.sh |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:30.223 PM PST][INFO][utils]: Executing command: /lustre/fs12/portfolios/llmservice/users/zhidingy/vlmeval/work_dirs/eval/Eagle-Next/InfoVQA_TEST_20250210-122428/cluster_submit_command_InfoVQA_TEST_20250210-122428.sh |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:30.799 PM PST][INFO][utils]: Stdout: |
|
Mon Feb 10 12:24:35 2025: Submitted batch job 2908685 |
|
Mon Feb 10 12:24:35 2025: |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:30.799 PM PST][INFO][slurm]: Job Id is 2908685 |
|
Mon Feb 10 12:24:35 2025: [2025-02-10 12:24:31.203 PM PST][INFO][submit_job]: Non blocking execution - job has been submitted. |
|
|